Nicolas Patry

Results 978 comments of Nicolas Patry

I guess it's because you keep instantiating the tokenizer that way, there really should be a way to have it once per thread. Other options would be to batch encode...

do you mind sharing for other users maybe ?

Is using directly `_tokenizer` on your part possible ? (don't call tokenizer.encode anymore) `transformers` need to maintain backward compatibility and is unlikely to change any of its API. `tokenizers` is...

Does it do the same thing ? From the docs, it seems to be a simple whitespace split, not really a BPE or Unigram tokenizer: https://www.tensorflow.org/tutorials/tensorflow_text/intro If this is the...

Hi @felix-schneider , Yes the `.pyi` are intended as help, and AFAIK there's now way to make them perfectly consistent (as the bindings are in Rust, so it's custom code...

The doc for `decoder` is here: https://huggingface.co/docs/tokenizers/python/latest/components.html?highlight=decoder#decoders As it mentions, it's a way to revert some modifications when getting back text (while `decoding` :D). The generated bindings can definitely made...

> How to deal with existing "local dir" that have symlinks in them. I would advocate for a duplicate local file + remove symlink. Given the anticipated flow, isn't that...

@rwightman Your use case would just mean: - bob has no cache, makes a bunch of HEAD requests to the hub to recreate the metadata (in whatever format), and picks...

Hi @kiszk, Sorry about the delay, I haven't been able to check every place I'm being pinged on (and far from it). I understand the issue and the fix. From...

Managed to trigger the issue https://github.com/huggingface/safetensors/actions/runs/10109667229/job/27958053499?pr=507 on old tests with random values.