On July 11-13, 2022, the 17th International Conference on Software Technologies (ICSOFT 2022) was held in Lisbon (Portugal), which brought together researchers, engineers and practitioners interested on software technologies. The research team was represented by Cosmin Strilețchi, Madalina Chitez, and Karla Csürös, who presented the paper Building Roger: Technical Challenges While Developing a Bilingual Corpus Management and Query Platform.
This paper presents an approach to a bilingual Corpus query system. ROGER has been designed and implemented as a cross-platform distributed web application. The backend interface available to authenticated administrators provides the digital tools for managing the database stored texts and associated metadata, and also offers an extensive statistics mechanism that cover the data composition and usage (words, characters, languages, study levels, genres, domains and n-grams). The frontend capabilities are offered to the registered users allowing them to search for specific keywords and to refine the obtained results by applying a series of filters. Current platform features include search terms and phrases, n-gram distributions and statistical visualizations for performed queries. After inputting a search term / phase, the user may filter available texts by: (i) language (English, Romanian); (ii) student genre (currently 20 genres); (iii) study year (1 through 4); (iv) level (BA, MA or PhD); (v) discipline (currently 8 disciplines) and (vi) gender (male, female or unknown). A series of solutions have been implemented to improve the response times of the intensely computational procedures that manipulate big amounts of data.
More information about the paper can be accesed here.
Excerpts from the presentation: