Difference between revisions of "Gaelic/Using Corpas na Gàidhlig"

From Celtic Languages
Jump to navigationJump to search
(Created page with "[https://dasg.ac.uk/corpus/ ''Corpas na Gàidhlig''] (the ''DASG corpus'') is a publicly available corpus of Scottish Gaelic literary texts (from 1200 to 21st century, though...")
(No difference)

Revision as of 17:58, 20 November 2022

Corpas na Gàidhlig (the DASG corpus) is a publicly available corpus of Scottish Gaelic literary texts (from 1200 to 21st century, though most texts are modern, ie. 19th century and later) – books, newspapers, poems, advertisements. It will possibly be expanded with non-literary texts later too (eg. transcriptions of recorded folk tales).

The interface of the corpus uses the open source Corpus Workbench (CWB) software and a custom modification of the CQPweb web interface.

The texts included in the corpus are annotated with information about the time priod they’re from, the literary type of the work, its author, etc. Unfortunately, the words are not annotated with part-of-speech tags and there’s no meta-information about structure of the sentences which limits somehow queries that are possible. Still, the interface allows users to use wildcards in the queries, use the CQP query syntax to make complex queries.

TODO: fill the rest