transformers
transformers copied to clipboard
KeyError: 'eval_loss' (LLaMA finetuning)
System Info
-
transformers
version: 4.28.1 - Platform: Linux-5.15.90.1-microsoft-standard-WSL2-x86_64-with-glibc2.35
- Python version: 3.10.6
- Huggingface_hub version: 0.13.3
- Safetensors version: 0.3.0
- PyTorch version (GPU?): 2.0.0+cu117 (True)
- Jax version: not installed
- JaxLib version: not installed
- Using GPU in script?: Yes RTX 3090
- Using distributed or parallel set-up in script?: No
I'm running into this issue whenever I use a DatasetDict as the evaluation dataset
traceback (most recent call last):
File "/mnt/e/alpaca-lora/finetune.py", line 304, in <module>
fire.Fire(train)
File "/home/coen/.local/lib/python3.10/site-packages/fire/core.py", line 141, in Fire
component_trace = _Fire(component, args, parsed_flag_args, context, name)
File "/home/coen/.local/lib/python3.10/site-packages/fire/core.py", line 475, in _Fire
component, remaining_args = _CallAndUpdateTrace(
File "/home/coen/.local/lib/python3.10/site-packages/fire/core.py", line 691, in _CallAndUpdateTrace
component = fn(*varargs, **kwargs)
File "/mnt/e/alpaca-lora/finetune.py", line 294, in train
trainer.train(resume_from_checkpoint=resume_from_checkpoint)
File "/home/coen/.local/lib/python3.10/site-packages/transformers/trainer.py", line 1662, in train
return inner_training_loop(
File "/home/coen/.local/lib/python3.10/site-packages/transformers/trainer.py", line 2006, in _inner_training_loop
self._maybe_log_save_evaluate(tr_loss, model, trial, epoch, ignore_keys_for_eval)
File "/home/coen/.local/lib/python3.10/site-packages/transformers/trainer.py", line 2291, in _maybe_log_save_evaluate
self._save_checkpoint(model, trial, metrics=metrics)
File "/home/coen/.local/lib/python3.10/site-packages/transformers/trainer.py", line 2394, in _save_checkpoint
metric_value = metrics[metric_to_check]
KeyError: 'eval_loss'
Traceback (most recent call last):
File "/mnt/e/alpaca-lora/finetune.py", line 304, in <module>
fire.Fire(train)
File "/home/coen/.local/lib/python3.10/site-packages/fire/core.py", line 141, in Fire
component_trace = _Fire(component, args, parsed_flag_args, context, name)
File "/home/coen/.local/lib/python3.10/site-packages/fire/core.py", line 475, in _Fire
component, remaining_args = _CallAndUpdateTrace(
File "/home/coen/.local/lib/python3.10/site-packages/fire/core.py", line 691, in _CallAndUpdateTrace
component = fn(*varargs, **kwargs)
File "/mnt/e/alpaca-lora/finetune.py", line 294, in train
trainer.train(resume_from_checkpoint=resume_from_checkpoint)
File "/home/coen/.local/lib/python3.10/site-packages/transformers/trainer.py", line 1662, in train
return inner_training_loop(
File "/home/coen/.local/lib/python3.10/site-packages/transformers/trainer.py", line 2006, in _inner_training_loop
self._maybe_log_save_evaluate(tr_loss, model, trial, epoch, ignore_keys_for_eval)
File "/home/coen/.local/lib/python3.10/site-packages/transformers/trainer.py", line 2291, in _maybe_log_save_evaluate
self._save_checkpoint(model, trial, metrics=metrics)
File "/home/coen/.local/lib/python3.10/site-packages/transformers/trainer.py", line 2394, in _save_checkpoint
metric_value = metrics[metric_to_check]
KeyError: 'eval_loss'
Who can help?
@sgugger
Information
- [ ] The official example scripts
- [x] My own modified scripts
Tasks
- [X] An officially supported task in the
examples
folder (such as GLUE/SQuAD, ...) - [ ] My own task or dataset (give details below)
Reproduction
- Download Alpaca-Lora from the repository
- Modify the code
if val_data_path is not None:
train_data = (
# data.select(range(10)).shuffle().map(generate_and_tokenize_prompt)
data.shuffle().map(generate_and_tokenize_prompt)
)
val_data: DatasetDict = load_from_disk(val_data_path)
val_data = (
val_data.map(generate_and_tokenize_prompt)
)
elif val_set_size > 0:
train_val = data.train_test_split(
test_size=val_set_size, shuffle=True, seed=42
)
train_data = (
train_val["train"].shuffle().map(generate_and_tokenize_prompt)
)
val_data: Dataset = (
train_val["test"].shuffle().map(generate_and_tokenize_prompt)
)
else:
train_data = data["train"].shuffle().map(generate_and_tokenize_prompt)
val_data: None = None
if not ddp and torch.cuda.device_count() > 1:
# keeps Trainer from trying its own DataParallelism when more than 1 gpu is available
model.is_parallelizable = True
model.model_parallel = True
# def compute_metrics(eval_preds):
# metric = evaluate.load("glue", "mrpc")
# logits, labels = eval_preds
# predictions = np.argmax(logits, axis=-1)
# return metric.compute(predictions=predictions, references=labels)
trainer = transformers.Trainer(
model=model,
train_dataset=train_data,
eval_dataset=val_data,
args=transformers.TrainingArguments(
per_device_train_batch_size=micro_batch_size,
gradient_accumulation_steps=gradient_accumulation_steps,
warmup_steps=100,
num_train_epochs=num_epochs,
learning_rate=learning_rate,
fp16=True,
logging_steps=10,
optim="adamw_torch",
evaluation_strategy="steps" if val_set_size > 0 else "no",
save_strategy="steps",
eval_steps=200 if val_set_size > 0 else None,
save_steps=200,
output_dir=output_dir,
save_total_limit=3,
load_best_model_at_end=True if val_set_size > 0 else False,
ddp_find_unused_parameters=False if ddp else None,
group_by_length=group_by_length,
report_to="wandb" if use_wandb else None,
run_name=wandb_run_name if use_wandb else None,
),
data_collator=transformers.DataCollatorForSeq2Seq(
tokenizer, pad_to_multiple_of=8, return_tensors="pt", padding=True
),
# compute_metrics=compute_metrics
)
model.config.use_cache = False
Expected behavior
Training as normal with seperate evaluation on each Dataset in the dict
[EDIT] the error occurs right after having validated every set. I can see that it starts training again.
Am I doing something wrong? I really see anything wrong with the evaluation datasets that I'm using They work when it's just one big evaluation Dataset object
If you need more info, please let me know :)
Not certain, but this may be related to #22885.
Not certain, but this may be related to #22885.
Thanks for the reference, however the proposed workaround (label_names=["labels"]
) did not work.
Please post a reproducer we can execute.
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.
Encountered the same issue! Trying to train a distilbert model on squad with the following code:
CUDA_VISIBLE_DEVICES=0 python3 ./transformers/examples/pytorch/question-answering/run_qa.py \
--model_name_or_path distilbert-base-cased \
--run_name distilbert-base-cased-squad-008 \
--dataset_name squad_v2 \
--do_train \
--do_eval \
--version_2_with_negative \
--learning_rate 3e-4 \
--lr_scheduler_type cosine \
--warmup_ratio 0.1 \
--num_train_epochs 8 \
--max_seq_length 512 \
--doc_stride 128 \
--evaluation_strategy steps \
--save_strategy steps \
--save_total_limit 3 \
--output_dir ./distilbert-base-cased-squad-008 \
--per_device_eval_batch_size 48 \
--per_device_train_batch_size 48 \
--push_to_hub true \
--hub_strategy end \
--hub_token ... \
--hub_private_repo true \
--load_best_model_at_end true
cc @ArthurZucker @Rocketknight1