notebooks icon indicating copy to clipboard operation
notebooks copied to clipboard

tokenizer warning for Multiple choice

Open jaideep11061982 opened this issue 2 years ago • 0 comments
trafficstars

https://github.com/huggingface/notebooks/blob/main/examples/multiple_choice.ipynb I think when we do tokenizer.pad in collator , its a slow operation so there is warning that suggests that when we do tokenizer( ) we can always padding =True there . Doing it inside collator slows the training, any way we can use padding option of tokenizer directly ?

accepted_keys = ["input_ids", "attention_mask", "label"]
features = [{k: v for k, v in encoded_datasets["train"][i].items() if k in accepted_keys} for i in range(10)]
batch = DataCollatorForMultipleChoice(tokenizer)(features)

jaideep11061982 avatar Sep 19 '23 14:09 jaideep11061982