Elia Robyn Lake (Robyn Speer)

Results 62 comments of Elia Robyn Lake (Robyn Speer)

I definitely think an OrderedSet belongs in the stdlib. I think that an OrderedSet that fits naturally in the stdlib would be different from this implementation, which predates the current...

That's right -- without a much fancier heuristic, we can't tell that "RosŽ" isn't the correct string.

Last time I updated the input corpora, Basque just missed the cutoff for having enough text for me to consider the frequencies representative. I had left myself a note that...

Closing because the wordfreq data is unlikely to be updated in any language.

I previously made this note because I thought we weren't supporting ISO-8859-2 mojibake at all, but we are. This word decodes correctly in the context of other ISO-8859-2 mojibake.

To be able to use wordfreq in Japanese, you need to have a UTF-8 compatible version of MeCab installed. If your package manager doesn't provide one (I checked and confirmed...

Oh, there's more that you need to actually get the dictionary: ``` cd ../mecab-ipadic ./configure --enable-utf8-only make sudo make install ```

I haven't confirmed that this part works, unfortunately, and I can't read Japanese well enough to follow the documentation.

Oh, I see! On CentOS, unlike on Ubuntu, the unmarked version is the UTF-8 one. I saw the reference to "EUCJP", but that's a separate version of the package, marked...