Nicolas Patry

Results 977 comments of Nicolas Patry

Hi, What you're seeing is 100% working as intended. The elements in the list ARE elements of your vocab, so they cannot be uppercased if your vocabulary isn't uppercased. `lowercase=True`...

@marcmk6 Sorry didn't see this the first time. Try doing `Unigram(vocab=vocab, unk_id=0)` (It's either `unk` or `unk_id` I don't remember). Bascially the vocab has no idea what is your unk,...

Did you hit your RAM limit and it start to use swap ? That could be an explanation. Did you try on `1Go` long dataset see if that fits?

Can you also provide a memory snapshot when it blocks ? (`top` or `htop` for instance). 45Go can require quite a lot more memory (depends on data) so if you...

@KatieGoz Probably unrelated, but did the read speed really switch from 7mn to 2h48 ?? That seems pretty off. Also the compute merges is actually starting this time, but it...

There is no cache no. I don't see why there would be such a difference if your code has not changed. Could be also underlying hardware ? If the issue...

I tried to reproduce your issue on english data (`big.txt` within the test files repeated many times). However, the binary heap was extremely fast, compared to other operations (1 order...

Hi @dszhengyu , Thanks for the report. - 180Go is likely to trigger some bug where we overflow the `u32` count method ( I can't be certain it will trigger,...

Hi ! No plans for now to support golang, but we might add support for a `cli`which would make it usable from Golang I guess. If you want to write...

We are not focusing on adding new language bindings at the moment, but stabilizing the current API. Once that's done adding new languages would be definitely be appreciated. If you're...