Osma Suominen
Osma Suominen
I took a look at this (sorry for the delay!) and I think this is a really good feature. Some thoughts: 1. Agree that these commands should be marked as...
About overwriting vocabularies - I think it would be great if Annif would notice, that the downloaded vocabulary differs from what exists locally. It should be no problem to download...
If it's the above bug in zipfile, it's [claimed](https://bugs.python.org/issue26185#msg374697) to be > Fixed in 3.9.0 Annif just dropped 3.8 support so maybe we're good? Just need to rebase this branch.
:tada:
What other character codes besides `U+fb50067` have we seen? Could these be used to reverse engineer some possible explanation? For example, if a UTF-8 sequence somehow got cut in the...
Both `U+fb50067` and `U+fb5005d` are way beyond the currently valid range of Unicode characters (up to `U+10FFFF`) and it shouldn't even be possible to represent them with the current UTF-8...
From the above error log, apparently this was the text to tokenize: 'sivuaineopintojen suhteen eroa havaittiin vain kahden sivuaineryhmän välilä.' What's the Unicode character near the end?
My hunch is that at the point this error happens, something has already gone wrong somewhere, likely inside Voikko. Maybe there is some internal cache or other data structure that...
Hello @lunactic , thank you for the suggestion! There is already some work being done to integrate Annif with language models, mainly by integrating the XTransformer model from PECOS in...
Sounds like a nice feature, PRs welcome!