Nicolas Patry
Nicolas Patry
Hi, The behavior can be explained by the fact that the encode, splits on whitespace and ignores them, then the decoder uses `Metaspace` (which is for the `spm` behavior) which...
Late to the party but everything should be deterministic (afaik at least). But `Trie` is a simple class object, so afaik it's hash function is linked to its `id(self)` so...
Which Python version are you using ? The trie is basically a big dict of dics, so deterministic nature depends on python version: https://stackoverflow.com/questions/2053021/is-the-order-of-a-python-dictionary-guaranteed-over-iterations Maybe the investigation is actually not...
You can use [tch-rs](https://github.com/LaurentMazare/tch-rs) to load models and use tokenizers from Rust if that's a viable option for you.
It is here : https://huggingface.co/docs/tokenizers/components#components However there doesn't seem to be nice code examples for this in the docs. PR are welcome :)
In principle, very favorable. In practice: ```python def process(encoding: Encoding, pair_encoding: Encoding, add_special_tokens: bool) -> Encoding ``` So with this signature, it's destructive show not really obvious how we should...
This is roughly what I suggested to @mishig25 as a solution. Using `Vec` actually instead of pairs since pair is also limiting in some form (#804 ) but roughly it's...
Hi @MRGLabs , I can't seem to reproduce this. Which version of `transformers` are you using ? Btw, `T5Tokenizer` is the "slow" version (not this lib), `T5TokenizerFast` is the one...
> Hi @Narsil , I'm using transformers 4.16.2. > I really can't seem to reproduce from a fresh install. Isn't there something change the log level or something in your...
Hi @tekumara , Unfortunately GH actions don't support running darwin arm yet (afaik). So all those prebuilt versions are done manually. If you know how to do prebuilt binaries automatically...