Arthur
Arthur
Method 1 does not really work if you want to have a different token for padding and ``: ```python >>> from transformers import LlamaTokenizer, LlamaForCausalLM >>> tokenizer = LlamaTokenizer.from_pretrained("decapoda-research/llama-7b-hf") >>>...
Given recent release of Llama2,, and in the light of the fact that resizing from 32K to 32K+1 can make inference and training slower, will support `padding_index=-1`. I'll be working...
If you set the padding index of the token embedding layer to -1, you don't need to change the size of the vocab, neither for the model nor for the...
If you want to follow the advances: #25088
Hey! PR is not merged yet, should be by the end of the week.!
Yes! The idea is that depending on your hardware, you should choose a `pad_to_multiple_of` value. This is for people who need performance optimisation. Otherwise, just add a padding token and...
cc @younesbelkada
Okay! I'll review again, can you make sure `make quality` and `make repo-consistency` both pass?
Nope thanks for the ping, it is just that it is a lot of changes on a lot of models (a lot of old models too 😉 ). Getting to...
If some attributes do not exist, let's just add the `# Adapted from` mention, and put the `# Copied from` only where it properly fits!