About padding_side='left'
Hi, I noticed that you set padding_side='left' in finetune.py. However, in llama the default padding side seems to be 'right'. Would this inconsistency causes certain problems such as performance drop?
I think default llama doesn't even use padding tokens.
how can it train without padding? I thought padding is necessary to collate sentences of different lengths into the same batch?
@stellaludai correct but during pre-training you usually don't use different lenghts. See HF tutorial for more details: https://huggingface.co/course/chapter7/6?fw=pt#preparing-the-dataset
@chrisociepa I see, do you mean during llm pretraining, the sequences are often chunked into equal-length pieces, unlike at inference where one sequence occupies one dimension in the input batch?
yes, exactly
@chrisociepa @ElleLeonne Thank you very much for sharing!
I think @ElleLeonne just refers to the fact that there is no pad_token in the LlamaTokenizer config, the default value is None.
@stellaludai correct but during pre-training you usually don't use different lenghts. See HF tutorial for more details: https://huggingface.co/course/chapter7/6?fw=pt#preparing-the-dataset
@chrisociepa Thank you for sharing the link. But I don't think they are about the same thing as the question itself:
The example in the link manually throws away chunks where length != context_length but the alpaca-Lora did not do this when creating dataset. Rather, it does the following padding in datacollator:
data_collator=transformers.DataCollatorForSeq2Seq(
tokenizer, pad_to_multiple_of=8, return_tensors="pt", padding=True
)
I think padding_side = left/right should not matter but attention_mask should be implemented which is not the case in alpaca-lora IMHO.
@Nsigma-Bill Thanks for sharing your opinion, but actually the above answers solved my question. In the link, the last remaining tail are thrown away if it is shorter than the previous chunks, which aims to avoid padding issues. As for attention mask, the transformers package should already have done for this purpose.