Arthur

Results 795 comments of Arthur

Method 1 does not really work if you want to have a different token for padding and ``: ```python >>> from transformers import LlamaTokenizer, LlamaForCausalLM >>> tokenizer = LlamaTokenizer.from_pretrained("decapoda-research/llama-7b-hf") >>>...

Given recent release of Llama2,, and in the light of the fact that resizing from 32K to 32K+1 can make inference and training slower, will support `padding_index=-1`. I'll be working...

If you set the padding index of the token embedding layer to -1, you don't need to change the size of the vocab, neither for the model nor for the...

Hey! PR is not merged yet, should be by the end of the week.!

Yes! The idea is that depending on your hardware, you should choose a `pad_to_multiple_of` value. This is for people who need performance optimisation. Otherwise, just add a padding token and...

Okay! I'll review again, can you make sure `make quality` and `make repo-consistency` both pass?

Nope thanks for the ping, it is just that it is a lot of changes on a lot of models (a lot of old models too 😉 ). Getting to...

If some attributes do not exist, let's just add the `# Adapted from` mention, and put the `# Copied from` only where it properly fits!