Computer Corpus of the Galician Language

Lidia López Teixeiro
César Osorio Peláez

The Computer Corpus of the Galician Language (Tesouro Informatizado da Lingua Galega or TILG) is a lemmatized, parsed corpus of modern Galician developed at the Instituto da Lingua Galega under the direction of Antón Santamarina. In its most recent version, on line since the 28th of November, 2010, it covers 1,897 works by 730 authors, all from between 1612 and 2010, searchable through an integrated data base of over 25 million words subsumed under 165,000 lemmas. TILG permits two types of search, by word or by headword (lemma), each of which may be refined through various advanced settings.

Present work on the project aims mainly to enrich the Tesouro by incorporating new texts, increase its usefulness through the development of new search tools, and enhance the search results by providing complementary statistical information.

Secretaría Xeral de Política Lingüística da Consellería de Cultura, Educación e Ordenación Universitaria (convenio coa USC)