Nicolas Patry
Nicolas Patry
Hmm this seems pretty odd. Version 0.8.1rc2 does seems to have a bug in it, however it is an extremely old version, it should never have to go back there....
possibly. Is `jax` now supported on windows ? If yes, you can install `pip install transformers` and `pip install jax` separately, and everything should work. If that's the case, you...
Thanks for all this information. Glad you could find a workaround that issue I am pretty sure it could help other users.
Hi @umbra-scientia . tl;dr you can probably just set the value in `tokenizer.json` file from gpt2: https://huggingface.co/gpt2/raw/main/tokenizer.json And just load it from file. The long answer: `tokenizers` isn't really modeled...
Hi @avi-jain, Do you mind sharing which tokenizer it is you are using ? It's hard to help you without knowing what kind of tokenizer you are using and if...
Ok. `\ud83c\udf37`. is NOT valid utf-8. Bert normalizes strings before treating them hence discards those characters. ``` In [11]: tokenizer('\ud83c\udf37') Out[11]: {'input_ids': [101, 102], 'token_type_ids': [0, 0], 'attention_mask': [1, 1]}...
Hi I think this specific question belongs more in https://github.com/huggingface/transformers. (Or even the discuss forum since it seems highly intentional: https://discuss.huggingface.co/ and probably has some historical answer.) First of all...
I am going to leave this open, just because I don't have the background as to why this was added. But it feels intentional enough it's not a bug for...
It's not on the roadmap afaik. I would require quite a bit of change in the underlying structure of the code to support that Is there anything wrong with: ```python...
You're free to open a PR for this, but unfortunately this won't be just a couple of files. It's not impossible for sure, but it will shift quite a bit...