llama
llama copied to clipboard
Padding for training and inference
Is llama 2 trained with batch? If so, why there is no pad token? If I want to finetune the model and then inference in batch. A suggestion is to pad from left. I know I should pad from left for inference. Should I pad from left in finetuning?
Our inference code shows a way to add tokens to the right, not to the left. I believe that's a common way to do padding:
https://github.com/facebookresearch/llama/blob/556949fdfb72da27c2f4a40b7f0e4cf0b8153a28/llama/generation.py#L167-L170
@ruanslv
I have seen many discussions regarding the padding_side (mostly about GPT2), and the verdict was since GPT2 is an autoregressive causal language model, padding must be on the left, otherwise (in batch inference), there will be pad tokens between the prompt and the generation tokens which means the first generated token will use the logits of the previous token (in right-padding case, a pad token) which is wrong. How is Llama different? Any specific reason the generation.py uses right padding?
Has anyone tested the difference between using padding_left and padding_right?
@Reason-Wang as far as I am aware, padding should always be set on the left for inference. The idea is that you want to enforce a certain length for you input prompt (usually, for batching reasons, i.e. you need all the inputs sequences of a certain batch to have the same length). If you set padding_side="right"
you add noise between the prompt and the generation. On the other hand, padding_side
should be set equal to "right"
during finetuning
@Reason-Wang as far as I am aware, padding should always be set on the left for inference. The idea is that you want to enforce a certain length for you input prompt (usually, for batching reasons, i.e. you need all the inputs sequences of a certain batch to have the same length). If you set
padding_side="right"
you add noise between the prompt and the generation. On the other hand,padding_side
should be set equal to"right"
during finetuning
I get that during inference padding should be added on the left. Why do we want the padding to be on the right during fine-tuning?