Python issues: Splitting long text by folia2txt and FLAT in the custom software
- I've installed folia-utils and used the "folia2txt -s ..." from CLI to split a long string in sentences. Unfortunately, if I split the old Slavonic text "Искони бе Слово и Слово бе отъ Бога. и Богъ бе слово." in sentences I get the wrong answer Искони бе Слово и Слово бе отъ Бога. и Богъ бе слово. If I split an English text, it works just fine.
- Is it possible to run FLAT not as a tab in an internet browser, but as a PySide widget? BTW, I can't import folia2html from the foliatools package in my Python script as I did with foliatools.folia2txt, foliatools.foliafreqlist, foliatools.foliatree. Nevertheless, I can run it from the CLI by "python.exe foliatools\folia2txt.py -s myannotation.xml"
- I've installed folia-utils and used the "folia2txt -s ..." from CLI to split a long string in sentences.
folia2txt -s is not a proper sentence splitter, it simply assumes each line of a text file is already its own sentence!
For an actual tokeniser and sentence splitter with rich FoLiA support, consider ucto: https://github.com/LanguageMachines/ucto
Although it has no specific rules for Old Church Slavonic, but you can use the generic ruleset (named generic) or the russian one tokconfig-rus).
- Is it possible to run FLAT not as a tab in an internet browser, but as a PySide widget?
I hadn't heard of these until now so I don't know. I suppose if there's such a qt widget which holds a whole web browser, then yes.
BTW, I can't import folia2html from the foliatools package in my Python script as I did with foliatools.folia2txt, foliatools.foliafreqlist, foliatools.foliatree. Nevertheless, I can run it from the CLI by "python.exe foliatools\folia2txt.py -s myannotation.xml"
Hmm.. I see.. that should be probably be improved yes.