Nicolas Patry

Results 977 comments of Nicolas Patry

I personally don't see any issue, but this will work only with BPE, so it won't be a general method. Would you want to tackle this PR ? Pinging @n1t0...

Yes the seed is model dependant, and I think that's opening a bit too much the encapsulation of this library. Especially maintaining it in all the bindings is going to...

Still not at the moment. You can open a PR if you wish. The main point of concerns would be that this would bypass entirely `pre_tokenizer`, `normalizer`, `post_processor` and the...

Hi @HelloRusk , I fully understand your frustration if you spent time on this. > As you know, there are some languages that consist of a huge number of characters...

> Still, I personally think it would be more user-friendly to have some kind of warning output when the total number of characters exceeds the the value. I see !,...

`ia32` what's your platform/CPU ? We seem to prebuild only for `x64`. An short term fix would be for you to build directly from sources. ```bash git checkout https://github.com/huggingface/tokenizers cd...

It seems you are on `i386` not `x64` so you need to build from source I guess as said earlier. Unfortunately I don't really know enough about `npm` to understand...

Do you have rust installed ? https://www.rust-lang.org/tools/install

Yes, but there are issues, for instance we don't support node 16+ (we need to rewrite the bindings). Can you try with Node 12/14 ? I know this is bothering,...

@chinoll I confirm this issue. It is linked to inconsistency in how the trainer is automatically called from your code. A quick fix for you use case is this: ```python...