Rob Reynolds

Results 53 comments of Rob Reynolds

@Phaqui This is a limitation of the `manylinux` images I use to build the wheels. If @TinoDidriksen 's solution doesn't work for you, you may be able to... 1. `git...

Might be better to use `'\u0301'` to mark stress. It is much less likely to occur in input texts, and it is much more readable.

Issue created here: https://github.com/hfst/hfst/issues/448

One year later, this is now an issue for Python 3.9, too.

As for [connl-u format](https://universaldependencies.org/format.html), there does not appear to be any way to represent ambiguity, so the conversion would be lossy.

`mystem` can have ambiguous readings separated by `|` in its output, even with the `-d` (disambiguate) flag: ```bash $ echo "Мы уже работаем здесь три недели." | mystem3.1 -ind Мы{мы=SPRO,мн,1-л=им}...

And webapps: https://datayze.com/readability-analyzer.php http://www.analyzemywriting.com/

It looks like SynTagRus has now been published in a Universal Dependencies format: https://github.com/UniversalDependencies/UD_Russian-SynTagRus/tree/master

also other UD treebanks exist: https://universaldependencies.org/#russian-treebanks

Submitted issue to `HFST` about this: https://github.com/hfst/hfst/issues/483. The maximum buffer size appears to be `1024` bytes, so a workaround could check `len(bytes(input_str, encoding='utf8')) < 1000`, and use a regular subprocess...