Elia Robyn Lake (Robyn Speer)
Elia Robyn Lake (Robyn Speer)
I definitely think an OrderedSet belongs in the stdlib. I think that an OrderedSet that fits naturally in the stdlib would be different from this implementation, which predates the current...
That's right -- without a much fancier heuristic, we can't tell that "RosŽ" isn't the correct string.
Last time I updated the input corpora, Basque just missed the cutoff for having enough text for me to consider the frequencies representative. I had left myself a note that...
Closing because the wordfreq data is unlikely to be updated in any language.
I previously made this note because I thought we weren't supporting ISO-8859-2 mojibake at all, but we are. This word decodes correctly in the context of other ISO-8859-2 mojibake.
To be able to use wordfreq in Japanese, you need to have a UTF-8 compatible version of MeCab installed. If your package manager doesn't provide one (I checked and confirmed...
Oh, there's more that you need to actually get the dictionary: ``` cd ../mecab-ipadic ./configure --enable-utf8-only make sudo make install ```
I haven't confirmed that this part works, unfortunately, and I can't read Japanese well enough to follow the documentation.
Oh, I see! On CentOS, unlike on Ubuntu, the unmarked version is the UTF-8 one. I saw the reference to "EUCJP", but that's a separate version of the package, marked...
I'm sorry, I don't understand what that's doing.