Anthony MOI

Results 33 comments of Anthony MOI

That's great news @alexeyr! I'm looking forward to seeing this! I think you can go with copying it for now, that seems totally fine.

What would you be your take on this? Does being opinionated on this (by picking our preferred `Result`-like types) makes it harder for somebody to use it?

No that's not possible, you'll have to add the tokens manually indeed.

It depends on the specific tokenization algorithm, but the tokenizer doesn't save all the training state that would be needed to pick up the training back where it was initially...

We should probably do some more benchmarks for this. This is indeed surprising, but I guess it is highly dependant on the different use cases, and might not reflect the...

Hi @hkennyv and thank you for reporting this. We don't build wheels for Apple Silicon at the moment because there is no environment for this on our Github CI. (cf...

Hi @joepalermo, would you mind sharing the resulting `tokenizer.json` file? It would be very helpful for us to debug this.

Have you tried the solution proposed by @lukas-blecher to use a pre-tokenizer? I believe this issue is related to this one: https://github.com/huggingface/tokenizers/issues/645

Looks good to me, but I'm no expert (maybe @thomwolf knows better). Feel free to add it to the readme, and we'll merge this :slightly_smiling_face:

Thank you for suggesting this. As far as I know, WebAssembly does not support multithreading so I don't know how this would integrate. We clearly don't want to sacrifice the...