Tesouro Informatizado da Lingua Galega


Tesouro Informatizado da Lingua Galega lets you search for words, tags and headwords in all its texts. The search interface is opened from the Search tab on the menu bar and offers two search modes: simple, where you just type in the graphic word or headword you want to search for, and advanced, where you can specify other search parameters as will be explained.

1. Simple search

The simple search option lets you search for either a graphic word or a headword (i.e. as a word as it would be listed in a dictionary). Select the kind of search you want and type a word or a headword into the search box. After you click on Search the application brings up a concordance.

For example, to search for the headword cobiza, choose simple search from the pull-down menu, choose Headword as your Type of search, type cobiza in the search box (under Text), and click on the Search button. As seen in the screen capture, this produces a concordance in chronological order of all the words the headword of which is cobiza (inflected forms and variants), e.g. cobiza, cobisa, cobiça, cubiza, cobicia, codicia, cobizas, cubizas etc.

En cobiza

If you want to search for the specific form cubiza, choose the Graphic word search type, type in cubiza and click on Search. This time a chronologically ordered concordance is displayed containing every occurrence in the corpus of the specific form cubiza:

En cubiza

You can search for sequences of up to five consecutive graphic words or headwords using simple search. For example if you want to search for occurrences of the phrase ter gana de leria, write it in the search box, selecting Headword as the Type of search. This will return a concordance of variants of the phrase including ter (teño, tes, tiña...) gana de leria, ter (teño, tes, tiña...) ganas de leria, ter (teño, tes, tiña...) gana de lerias etc.

En ter gana de leria

If, on the other hand, you specifically want to find an invariable expression like meu can pillou unha mosca, type it into the search box and choose Graphic word as the Type of search:

En meu can pillou unha mosca

In all these examples, you can view the wider context of any one search result by clicking on the number at the beginning of a line in the table. The context containing the item will be widened, giving fuller information about the source text, like this:

En meu can pillou unha mosca context

2. Advanced search

The advanced search form is made up of five blocks or sections: Type of search, Results, Sensitivity, Filters and Search items. See the picture:

En advanced search

2.1. Type of search

Use the Type of search section to choose what type of item(s) you wish to search for by selecting one of the following options:

2.2. Results

You can specify the kind of search results you want displayed and how to display them in the Results section, which has the following settings:

2.3. Sensitivity

The Sensitivity section is used to specify whether or not to take written accents into account in the search, and whether the search should be case-sensitive. When you choose No for Case, a search for rosa returns search results which ignore whether or not the word is capitalized (i.e. both rosa and Rosa are included); if you set Case to Yes, the same search will only concordance examples of rosa in lower case (i.e. rosa but not Rosa). If you set Accents to No when you search for mais, search results will list forms both with and without an accent (either mais or máis); if you say Yes, the results will only include examples without an accent (mais).

2.4. Filters

With filters you can limit a search to the type of text you are interested in. You can filter by genre (novels, letters, poetry etc.), language variety (dialectal or non-dialectal), medium (books or periodicals), gender of the author, channel (oral or written) or chronological period. There is also a filter for limiting searches to proper names.

2.5. Search items

The Search items section is where you type in what you want to search for and can specify the grammatical unit, tag, headword or graphic word you want the item to be associated with. You needn’t know the codes to specify a tag, all you need to do is click on the question mark to the right of the search box and use the drop-down menu.

You can create a new line by clicking on the + sign on the right and search for several items at the same time.

3. Advanced search: examples

Here are some examples of how to use advanced search.

3.1. Simple and complex searches for headwords and grammatical units

For this kind of search, the Type of search selected should be Headwords / Grammatical units. This is the default search setting.

If you type desfacer in the Headword field, you will obtain a concordance of occurrences of inflected forms of the verb desfacer, including its variants, such as desfacer, desfaga, desfixo, desfaer, desfaguendo etc. as in the next picture:

En desfacer

If you are looking for second-person-plural past indicative forms of verbs, pick the POS tag option by clicking on the question mark on the right, then choose Verb, Indicative, Past, 2nd person, Plural:

En tag search selector

This will generate a concordance of forms in the corpus fulfilling the conditions you have stipulated (words such as empregastes, fundastes, labrastes, librastes, nacistes...):

En tag search

If you would like to limit the search to second-person-plural forms of the past indicative of the verb nacer, also type in nacer in the Headword field. The resulting concordance will only show forms of nacer (nacistes, nacestes, nacéstedes, nacíchedes etc.):

En tag lemma search nacer

To concordance all cases of the past indicative of falar followed by any preposition, specify the search parameters for the past indicative of falar and then create a new line (by clicking on the + sign on the right) and select Preposition in the POS tag field:

En tag search selector prep

This will produce a concordance containing sequences like falei de, falei por, falei con etc.:

En tag search several

To obtain a concordance of all occurrences of the form podo belonging to the verb poder (i.e. not including forms of the verb podar, which can also be podo), type both poder in the Headword field and podo in the Graphic word or Grammatical unit field:

En element lemma search podo

3.2. Simple and complex searches for word forms

If you select Graphic words under Type of search in the Advanced search form, notice how the Search items section is replaced by a single search box titled Text. You can type a sequence of up to five word forms here to obtain a concordance of all the examples of this specific sequence. So if you type a miúdo in the Text box the application will return this:

En unit miudo

3.3. Proximity searches for graphic words, headwords or grammatical units

You can use the Advanced search form to find examples of texts where two graphic words, headwords or grammatical units occur together at a certain distance from each other; the distance specified may be anything up to ten words or components away. Here’s how.

Say you want to find all the occurrences of the verb roubar followed by a noun at a distance of three words or less. First, select Proximity search (Headwords / Grammatical units) in the Type of search box. Type roubar in the first element’s Headword field, set POS tag to Common noun in the second element and check ≤3 in the distance field (on the far right). This will give the following result:

En distances roubar subst

This kind of search is sensitive to the order in which elements are specified, so if you wanted to find examples in which a noun occurs to the left of roubar you would need to redo the query, switching the order of the search fields in Search items.

4. Downloading search results

The concordances obtained through either simple or complex queries can be downloaded in TSV format (tab-separated values, similar to CSV but with fields separated by a tab character). Just click on the form’s Download button.

Note that what gets downloaded coincides with the search results on the page you can see right now. In Simple search, the search result output is set to 50 results per page. In Advanced search, the number of results per page can be adjusted to display a larger number of examples.

When you click on Download, a dialog box is displayed where you can choose between Simplified TSV and Full TSV. The main difference is that the Full TSV option includes information about whether or not a grammatical unit or graphic word is part of a proper name.

The first two lines of the downloaded file give the parameters of the search (the first line lists the field names of the search and the second the values of each field). This is followed by the names of the search result fields and the results themselves.

5. Further remarks

5.1. Enclitic pronouns

A graphic word is defined as any sequence of alphanumeric characters between spaces and/or punctuation marks.

However, if you want to search for a verb form followed by a certain pronominal clitic (such as in dicirlle) or a series of clitics (such as in dicírllelo) in a graphic word search, insert a + symbol between the verb and the first clitic, like this: dicir+lle, dicir+llelo.

In a search for headwords and grammatical units, the sequences of a verb form plus one or more clitics should be broken down into its constituent part, placing each part on its own line. Use the + button to add lines as needed. For example, to search for dicírllelo in a Headwords / Grammatical units search, type dicir in the box for the first component, lles on the next line and lo on the third line, like this:

En enclitics dicirllelo

5.2. Contractions

Contracted forms can be searched for directly as graphic words. To find occurrences of the contraction cunha, you can simply type cunha in to the search box.

In searches for headwords and grammatical units, however, contracted forms must be broken down into their parts. To find combinations of the preposition con with the indefinite article unha, type these on separate lines (use the + button to add a line):

En contractions con unha

If you wish to search for all sequences consisting of the preposition con and a form of the indefinite article, make con the first grammatical unit, and un the headword on the second line:

En contractions con un

5.3. Expanded context

You can expand the context of an example by clicking on the number to the left of the concordance line (the number indicating its place in the list).

The way contexts are expanded depends on which search mode is activated. If the concordance has been generated from a graphic word search, the expanded context will display a text fragment with a header containing data:

En context units

If this is a search for headwords or grammatical units, the expanded context will present the text with POS tags and corresponding headwords for each item:

En context tokens

5.4. Wildcards and operators

The wildcards * and ? may be used in any search field. * represents any number of letters (or even none). ? represents any single letter.

For example, to search for all words beginning with des, type des* in the Headword field:

En wildcards des

To search for either of the variant forms perdiches ~ perdeches, type perd?ches in the Text field:

En wildcards perdiches

The pipeline character (|) and the exclamation mark (!) may be used in search fields as equivalents of the Boolean operators OR and NOT, respectively.

To search for both oín and ouvín at the same time, type oín|ouvín in the search box:

En operators or oin ouvin

To search for all the words that begin with desenv- except for those which begin with desenvolv-, type desenv*!desenvolv* into the text box:

En operators not desenvolv

By combining wildcards and operators, you can perform quite complex searches. For instance, this search would produce a concordance for occurrences of the verb desenvolver when it is not followed by a pronoun:

En operators combined desenvolv pron

And here is how to get a list of all occurrences of the verb desenvolver followed by any personal pronoun except se:

En operators combined desenvolv not se