udar icon indicating copy to clipboard operation
udar copied to clipboard

collect gold-standard corpora

Open reynoldsnlp opened this issue 6 years ago • 2 comments

We need a large collection of gold-standard disambiguated Russian texts for FST/CG testing. One way or another, this will require converting tags and format to udar/CG3. Some possibilities include:

reynoldsnlp avatar Sep 23 '19 00:09 reynoldsnlp

It looks like SynTagRus has now been published in a Universal Dependencies format: https://github.com/UniversalDependencies/UD_Russian-SynTagRus/tree/master

reynoldsnlp avatar Sep 23 '19 13:09 reynoldsnlp

also other UD treebanks exist: https://universaldependencies.org/#russian-treebanks

reynoldsnlp avatar Nov 16 '19 02:11 reynoldsnlp