composer icon indicating copy to clipboard operation
composer copied to clipboard

InContextLearning*Dataset Default padding sides hardcoded?

Open MFajcik opened this issue 1 year ago • 1 comments

Hi, I was wondering regarding your code here. https://github.com/mosaicml/composer/blob/a7cad7c221ce8ad9697bde50db0b3f37f8b8025e/composer/datasets/in_context_learning_evaluation.py#L655

Why do you assume right padding (for InContextLearningMultipleChoiceTaskDataset problem, but also some others)?

  1. Shouldn't the padding_side be derived from the tokenizer?
  2. Assuming right padding breaks some models (Mistral is unusable).

Thanks for information.

MFajcik avatar Dec 13 '23 15:12 MFajcik