Boris Orekhov
Results
2
comments of
Boris Orekhov
Depends on what do you mean by "extract raw text". Extract from what?
If you mean the [text collection](http://web-corpora.net/ThaiCorpus/texts_tagged.zip), these xml files were converted into a specific corpus format that allows indexing and searching. Converter is [here](https://github.com/nevmenandr/thai-language/blob/master/armenian_engine/armenian_engine.py).