unsloth icon indicating copy to clipboard operation
unsloth copied to clipboard

Not able to add data_collator

Open brand17 opened this issue 9 months ago • 1 comments

I am trying the example: Google Colab

The only thing I did - I added data_collator:

    from transformers import DataCollatorWithPadding
    data_collator = 
    DataCollatorWithPadding(tokenizer=tokenizer)
    trainer = SFTTrainer(
        model=model,
        tokenizer=tokenizer,
        data_collator=data_collator,
        train_dataset=train_dataset,
        dataset_text_field="text",
        max_seq_length=max_seq_length,
        dataset_num_proc=2,
        packing=False,  # Can make training 5x faster for short sequences.
        args=TrainingArguments(
            per_device_train_batch_size=2,
            gradient_accumulation_steps=4,
            warmup_steps=5,
            max_steps=60,  # Set num_train_epochs = 1 for full training runs
            learning_rate=2e-4,
            fp16=not torch.cuda.is_bf16_supported(),
            bf16=torch.cuda.is_bf16_supported(),
            logging_steps=1,
            optim="adamw_8bit",
            weight_decay=0.01,
            lr_scheduler_type="linear",
            seed=3407,
            output_dir="outputs",
        ),
    )

But I am getting error ValueError: The model did not return a loss from the inputs, only the following keys: logits. For reference, the inputs it received are input_ids,attention_mask. on calling trainer.train()

brand17 avatar May 12 '24 12:05 brand17

I think you need to use DataCollatorForLanguageModeling or DataCollatorForSeq2Seq

danielhanchen avatar May 13 '24 10:05 danielhanchen