Rob Reynolds
Rob Reynolds
@Phaqui This is a limitation of the `manylinux` images I use to build the wheels. If @TinoDidriksen 's solution doesn't work for you, you may be able to... 1. `git...
Might be better to use `'\u0301'` to mark stress. It is much less likely to occur in input texts, and it is much more readable.
Issue created here: https://github.com/hfst/hfst/issues/448
One year later, this is now an issue for Python 3.9, too.
As for [connl-u format](https://universaldependencies.org/format.html), there does not appear to be any way to represent ambiguity, so the conversion would be lossy.
`mystem` can have ambiguous readings separated by `|` in its output, even with the `-d` (disambiguate) flag: ```bash $ echo "Мы уже работаем здесь три недели." | mystem3.1 -ind Мы{мы=SPRO,мн,1-л=им}...
And webapps: https://datayze.com/readability-analyzer.php http://www.analyzemywriting.com/
It looks like SynTagRus has now been published in a Universal Dependencies format: https://github.com/UniversalDependencies/UD_Russian-SynTagRus/tree/master
also other UD treebanks exist: https://universaldependencies.org/#russian-treebanks
Submitted issue to `HFST` about this: https://github.com/hfst/hfst/issues/483. The maximum buffer size appears to be `1024` bytes, so a workaround could check `len(bytes(input_str, encoding='utf8')) < 1000`, and use a regular subprocess...