Nicolas Patry

Results 977 comments of Nicolas Patry

Hmm this seems pretty odd. Version 0.8.1rc2 does seems to have a bug in it, however it is an extremely old version, it should never have to go back there....

possibly. Is `jax` now supported on windows ? If yes, you can install `pip install transformers` and `pip install jax` separately, and everything should work. If that's the case, you...

Thanks for all this information. Glad you could find a workaround that issue I am pretty sure it could help other users.

Hi @umbra-scientia . tl;dr you can probably just set the value in `tokenizer.json` file from gpt2: https://huggingface.co/gpt2/raw/main/tokenizer.json And just load it from file. The long answer: `tokenizers` isn't really modeled...

Hi @avi-jain, Do you mind sharing which tokenizer it is you are using ? It's hard to help you without knowing what kind of tokenizer you are using and if...

Ok. `\ud83c\udf37`. is NOT valid utf-8. Bert normalizes strings before treating them hence discards those characters. ``` In [11]: tokenizer('\ud83c\udf37') Out[11]: {'input_ids': [101, 102], 'token_type_ids': [0, 0], 'attention_mask': [1, 1]}...

Hi I think this specific question belongs more in https://github.com/huggingface/transformers. (Or even the discuss forum since it seems highly intentional: https://discuss.huggingface.co/ and probably has some historical answer.) First of all...

I am going to leave this open, just because I don't have the background as to why this was added. But it feels intentional enough it's not a bug for...

It's not on the roadmap afaik. I would require quite a bit of change in the underlying structure of the code to support that Is there anything wrong with: ```python...

You're free to open a PR for this, but unfortunately this won't be just a couple of files. It's not impossible for sure, but it will shift quite a bit...