llama icon indicating copy to clipboard operation
llama copied to clipboard

Padding for training and inference

Open Reason-Wang opened this issue 1 year ago • 5 comments

Is llama 2 trained with batch? If so, why there is no pad token? If I want to finetune the model and then inference in batch. A suggestion is to pad from left. I know I should pad from left for inference. Should I pad from left in finetuning?

Reason-Wang avatar Sep 14 '23 16:09 Reason-Wang

Our inference code shows a way to add tokens to the right, not to the left. I believe that's a common way to do padding:

https://github.com/facebookresearch/llama/blob/556949fdfb72da27c2f4a40b7f0e4cf0b8153a28/llama/generation.py#L167-L170

ruanslv avatar Oct 12 '23 14:10 ruanslv

@ruanslv

I have seen many discussions regarding the padding_side (mostly about GPT2), and the verdict was since GPT2 is an autoregressive causal language model, padding must be on the left, otherwise (in batch inference), there will be pad tokens between the prompt and the generation tokens which means the first generated token will use the logits of the previous token (in right-padding case, a pad token) which is wrong. How is Llama different? Any specific reason the generation.py uses right padding?

RaminZi avatar Jan 31 '24 03:01 RaminZi

Has anyone tested the difference between using padding_left and padding_right?

patrick-tssn avatar Mar 04 '24 13:03 patrick-tssn

@Reason-Wang as far as I am aware, padding should always be set on the left for inference. The idea is that you want to enforce a certain length for you input prompt (usually, for batching reasons, i.e. you need all the inputs sequences of a certain batch to have the same length). If you set padding_side="right" you add noise between the prompt and the generation. On the other hand, padding_side should be set equal to "right" during finetuning

ferrazzipietro avatar Mar 25 '24 13:03 ferrazzipietro

@Reason-Wang as far as I am aware, padding should always be set on the left for inference. The idea is that you want to enforce a certain length for you input prompt (usually, for batching reasons, i.e. you need all the inputs sequences of a certain batch to have the same length). If you set padding_side="right" you add noise between the prompt and the generation. On the other hand, padding_side should be set equal to "right" during finetuning

I get that during inference padding should be added on the left. Why do we want the padding to be on the right during fine-tuning?

zcakzhuu avatar May 10 '24 13:05 zcakzhuu