lit-llama
lit-llama copied to clipboard
Use of left padding
Left padding makes more sense for auto regressive models. Also HuggingFace's implementation uses left padding for tokenization.
Hi, I don't think we can use left-padding for training. See #77. Left-padding makes sense for batched inference. Could you elaborate a bit more on where you are suggesting a change?
Hey @saiajaym I just want to make sure there is no misunderstanding here, could you describe concretely what change needs to be made and where?