Faria, P., Galves, C. & Magro, C. (2023) Syntactic annotation for Portuguese corpora: standards, parsers, and search interfaces. Lang Resources & Evaluation. https://doi.org/10.1007/s10579-023-09699-4

Abstract

In the last two decades, four Portuguese syntactically annotated corpora were built along the lines initially defined for the Penn Parsed Historical Corpora (Santorini, 2016). They cover the old, the middle, the classical and the modern periods of European Portuguese, as well as the nineteenth and twentieth century Brazilian Portuguese, and include different textual genres and oral discourse excerpts. Together they provide a fundamental resource for the study of variation and change in Portuguese. In the last years, an effort was made to maximally unify the annotation scheme applied to those corpora, in such a way that the searches done on one corpus could be done in exactly the same manner on the others. This effort resulted in the Portuguese Syntactic Annotation Manual (Magro & Galves, 2019). In this paper, we present the syntactic annotation for the Portuguese Corpora. We describe the functioning of ParsPort, a rule-based parser which makes use of the revision mode of the query language Corpus Search (Randall, 2005–2015). We argue that ParsPort is more efficient to our annotation efforts than the probabilistic parser developed by Bikel (2004), previously used for the syntactic annotation of the Portuguese Corpora. Finally we mention recent advances towards more user-friendly tools for syntactic searches.