Setting compute_metrics in Trainer with Idefics2ForConditionalGeneration leads to AttributeError: 'DynamicCache' object has no attribute 'detach'
System Info
transformersversion: 4.41.0.dev0- Platform: Linux-5.15.146.1-microsoft-standard-WSL2-x86_64-with-glibc2.35
- Python version: 3.11.8
- Huggingface_hub version: 0.20.3
- Safetensors version: 0.4.2
- Accelerate version: 0.28.0
- Accelerate config: not found
- PyTorch version (GPU?): 2.2.2+cu121 (True)
- Tensorflow version (GPU?): 2.16.1 (True)
- Flax version (CPU?/GPU?/TPU?): not installed (NA)
- Jax version: not installed
- JaxLib version: not installed
- Using GPU in script?: Yes
- Using distributed or parallel set-up in script?: No
Who can help?
Not sure if this is an issue with the Trainer or the model.
Information
- [X] The official example scripts
- [ ] My own modified scripts
Tasks
- [ ] An officially supported task in the
examplesfolder (such as GLUE/SQuAD, ...) - [X] My own task or dataset (give details below)
Reproduction
The following code is from the Idefics2 fine-tuning example colab with the addition of the compute_metrics in the Trainer.
!pip install -q git+https://github.com/huggingface/transformers.git
!pip install -q accelerate datasets peft bitsandbytes
import torch
from peft import LoraConfig
from transformers import AutoProcessor, BitsAndBytesConfig, Idefics2ForConditionalGeneration
DEVICE = "cuda:0"
USE_LORA = False
USE_QLORA = True
processor = AutoProcessor.from_pretrained(
"HuggingFaceM4/idefics2-8b",
do_image_splitting=False
)
# Three options for training, from the lowest precision training to the highest precision training:
# - QLora
# - Standard Lora
# - Full fine-tuning
if USE_QLORA or USE_LORA:
lora_config = LoraConfig(
r=8,
lora_alpha=8,
lora_dropout=0.1,
target_modules='.*(text_model|modality_projection|perceiver_resampler).*(down_proj|gate_proj|up_proj|k_proj|q_proj|v_proj|o_proj).*$',
use_dora=False if USE_QLORA else True,
init_lora_weights="gaussian"
)
if USE_QLORA:
bnb_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype=torch.float16
)
model = Idefics2ForConditionalGeneration.from_pretrained(
"HuggingFaceM4/idefics2-8b",
torch_dtype=torch.float16,
quantization_config=bnb_config if USE_QLORA else None,
)
model.add_adapter(lora_config)
model.enable_adapters()
else:
model = Idefics2ForConditionalGeneration.from_pretrained(
"HuggingFaceM4/idefics2-8b",
torch_dtype=torch.float16,
_attn_implementation="flash_attention_2", # Only available on A100 or H100
).to(DEVICE)
from datasets import load_dataset
train_dataset = load_dataset("nielsr/docvqa_1200_examples", split="train")
train_dataset = train_dataset.remove_columns(['id', 'words', 'bounding_boxes', 'answer'])
eval_dataset = load_dataset("nielsr/docvqa_1200_examples", split="test")
eval_dataset = eval_dataset.remove_columns(['id', 'words', 'bounding_boxes', 'answer'])
import random
class MyDataCollator:
def __init__(self, processor):
self.processor = processor
self.image_token_id = processor.tokenizer.additional_special_tokens_ids[
processor.tokenizer.additional_special_tokens.index("<image>")
]
def __call__(self, examples):
texts = []
images = []
for example in examples:
image = example["image"]
question = example["query"]["en"]
answer = random.choice(example["answers"])
messages = [
{
"role": "user",
"content": [
{"type": "text", "text": "Answer briefly."},
{"type": "image"},
{"type": "text", "text": question}
]
},
{
"role": "assistant",
"content": [
{"type": "text", "text": answer}
]
}
]
text = processor.apply_chat_template(messages, add_generation_prompt=False)
texts.append(text.strip())
images.append([image])
batch = processor(text=texts, images=images, return_tensors="pt", padding=True)
labels = batch["input_ids"].clone()
labels[labels == processor.tokenizer.pad_token_id] = self.image_token_id
batch["labels"] = labels
return batch
data_collator = MyDataCollator(processor)
from transformers import TrainingArguments, Trainer
training_args = TrainingArguments(
num_train_epochs=2,
per_device_train_batch_size=2,
per_device_eval_batch_size=8,
gradient_accumulation_steps=8,
warmup_steps=50,
learning_rate=1e-4,
weight_decay=0.01,
logging_steps=25,
output_dir="/content/drive/My Drive/docvqa_ft_tutorial",
save_strategy="steps",
save_steps=250,
save_total_limit=1,
# evaluation_strategy="epoch",
fp16=True,
push_to_hub_model_id="idefics2-8b-docvqa-finetuned-tutorial",
remove_unused_columns=False,
report_to="none",
)
def custom_metrics(eval, preds):
exit(0)
trainer = Trainer(
model=model,
args=training_args,
data_collator=data_collator,
train_dataset=train_dataset,
eval_dataset=eval_dataset,
compute_metrics = custom_metrics,
)
trainer.evaluate()
Here is the exception :
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
/home/eyel/pm-ia-traitement-documents/src/python/notebooks/template.ipynb Cell 36 line [1](vscode-notebook-cell://wsl%2Bubuntu/home/eyel/pm-ia-traitement-documents/src/python/notebooks/template.ipynb#X50sdnNjb2RlLXJlbW90ZQ%3D%3D?line=0)
----> 1 trainer.evaluate()
File [~/miniconda3/lib/python3.11/site-packages/transformers/trainer.py:3513](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/home/eyel/pm-ia-traitement-documents/src/python/notebooks/~/miniconda3/lib/python3.11/site-packages/transformers/trainer.py:3513), in Trainer.evaluate(self, eval_dataset, ignore_keys, metric_key_prefix)
3510 start_time = time.time()
3512 eval_loop = self.prediction_loop if self.args.use_legacy_prediction_loop else self.evaluation_loop
-> 3513 output = eval_loop(
3514 eval_dataloader,
3515 description="Evaluation",
3516 # No point gathering the predictions if there are no metrics, otherwise we defer to
3517 # self.args.prediction_loss_only
3518 prediction_loss_only=True if self.compute_metrics is None else None,
3519 ignore_keys=ignore_keys,
3520 metric_key_prefix=metric_key_prefix,
3521 )
3523 total_batch_size = self.args.eval_batch_size * self.args.world_size
3524 if f"{metric_key_prefix}_jit_compilation_time" in output.metrics:
File [~/miniconda3/lib/python3.11/site-packages/transformers/trainer.py:3696](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/home/eyel/pm-ia-traitement-documents/src/python/notebooks/~/miniconda3/lib/python3.11/site-packages/transformers/trainer.py:3696), in Trainer.evaluation_loop(self, dataloader, description, prediction_loss_only, ignore_keys, metric_key_prefix)
3693 batch_size = observed_batch_size
3695 # Prediction step
-> 3696 loss, logits, labels = self.prediction_step(model, inputs, prediction_loss_only, ignore_keys=ignore_keys)
3697 main_input_name = getattr(self.model, "main_input_name", "input_ids")
3698 inputs_decode = self._prepare_input(inputs[main_input_name]) if args.include_inputs_for_metrics else None
File [~/miniconda3/lib/python3.11/site-packages/transformers/trainer.py:3904](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/home/eyel/pm-ia-traitement-documents/src/python/notebooks/~/miniconda3/lib/python3.11/site-packages/transformers/trainer.py:3904), in Trainer.prediction_step(self, model, inputs, prediction_loss_only, ignore_keys)
3902 return (loss, None, None)
3903 print(logits) #Eloi Remove
-> 3904 logits = nested_detach(logits)
3905 if len(logits) == 1:
3906 logits = logits[0]
File [~/miniconda3/lib/python3.11/site-packages/transformers/trainer_pt_utils.py:190](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/home/eyel/pm-ia-traitement-documents/src/python/notebooks/~/miniconda3/lib/python3.11/site-packages/transformers/trainer_pt_utils.py:190), in nested_detach(tensors)
188 "Detach `tensors` (even if it's a nested list/tuple/dict of tensors)."
189 if isinstance(tensors, (list, tuple)):
--> 190 return type(tensors)(nested_detach(t) for t in tensors)
191 elif isinstance(tensors, Mapping):
192 return type(tensors)({k: nested_detach(t) for k, t in tensors.items()})
File [~/miniconda3/lib/python3.11/site-packages/transformers/trainer_pt_utils.py:190](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/home/eyel/pm-ia-traitement-documents/src/python/notebooks/~/miniconda3/lib/python3.11/site-packages/transformers/trainer_pt_utils.py:190), in <genexpr>(.0)
188 "Detach `tensors` (even if it's a nested list/tuple/dict of tensors)."
189 if isinstance(tensors, (list, tuple)):
--> 190 return type(tensors)(nested_detach(t) for t in tensors)
191 elif isinstance(tensors, Mapping):
192 return type(tensors)({k: nested_detach(t) for k, t in tensors.items()})
File [~/miniconda3/lib/python3.11/site-packages/transformers/trainer_pt_utils.py:193](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/home/eyel/pm-ia-traitement-documents/src/python/notebooks/~/miniconda3/lib/python3.11/site-packages/transformers/trainer_pt_utils.py:193), in nested_detach(tensors)
191 elif isinstance(tensors, Mapping):
192 return type(tensors)({k: nested_detach(t) for k, t in tensors.items()})
--> 193 return tensors.detach()
AttributeError: 'DynamicCache' object has no attribute 'detach'
Seems to happend when the model's output's past_key_values are an empty DynamicCache.
Expected behavior
Should properly reach the custom_metrics and terminate cleanly.
I had the same error and fixed it by using model.config.use_cache=False during training. But @VictorSanh might know a better option
I had the same error and fixed it by using
model.config.use_cache=Falseduring training
That fixes this issue as the past_key_values are now full tensors. But leads to a new error :
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
/home/eyel/pm-ia-traitement-documents/src/python/notebooks/Idefics2_Fine_tuning_example.ipynb Cell 9 line [1](vscode-notebook-cell://wsl%2Bubuntu/home/eyel/pm-ia-traitement-documents/src/python/notebooks/Idefics2_Fine_tuning_example.ipynb#X43sdnNjb2RlLXJlbW90ZQ%3D%3D?line=0)
----> 1 trainer.evaluate()
File [~/miniconda3/lib/python3.11/site-packages/transformers/trainer.py:3513](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/home/eyel/pm-ia-traitement-documents/src/python/notebooks/~/miniconda3/lib/python3.11/site-packages/transformers/trainer.py:3513), in Trainer.evaluate(self, eval_dataset, ignore_keys, metric_key_prefix)
3510 start_time = time.time()
3512 eval_loop = self.prediction_loop if self.args.use_legacy_prediction_loop else self.evaluation_loop
-> 3513 output = eval_loop(
3514 eval_dataloader,
3515 description="Evaluation",
3516 # No point gathering the predictions if there are no metrics, otherwise we defer to
3517 # self.args.prediction_loss_only
3518 prediction_loss_only=True if self.compute_metrics is None else None,
3519 ignore_keys=ignore_keys,
3520 metric_key_prefix=metric_key_prefix,
3521 )
3523 total_batch_size = self.args.eval_batch_size * self.args.world_size
3524 if f"{metric_key_prefix}_jit_compilation_time" in output.metrics:
File [~/miniconda3/lib/python3.11/site-packages/transformers/trainer.py:3716](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/home/eyel/pm-ia-traitement-documents/src/python/notebooks/~/miniconda3/lib/python3.11/site-packages/transformers/trainer.py:3716), in Trainer.evaluation_loop(self, dataloader, description, prediction_loss_only, ignore_keys, metric_key_prefix)
3714 logits = self.preprocess_logits_for_metrics(logits, labels)
3715 logits = self.gather_function((logits))
-> 3716 all_preds.add(logits)
3717 if labels is not None:
3718 labels = self.accelerator.pad_across_processes(labels, dim=1, pad_index=-100)
File [~/miniconda3/lib/python3.11/site-packages/transformers/trainer_pt_utils.py:326](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/home/eyel/pm-ia-traitement-documents/src/python/notebooks/~/miniconda3/lib/python3.11/site-packages/transformers/trainer_pt_utils.py:326), in EvalLoopContainer.add(self, tensors)
324 self.tensors = tensors if self.do_nested_concat else [tensors]
325 elif self.do_nested_concat:
--> 326 self.tensors = nested_concat(self.tensors, tensors, padding_index=self.padding_index)
327 else:
328 self.tensors.append(tensors)
File [~/miniconda3/lib/python3.11/site-packages/transformers/trainer_pt_utils.py:138](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/home/eyel/pm-ia-traitement-documents/src/python/notebooks/~/miniconda3/lib/python3.11/site-packages/transformers/trainer_pt_utils.py:138), in nested_concat(tensors, new_tensors, padding_index)
134 assert type(tensors) == type(
135 new_tensors
136 ), f"Expected `tensors` and `new_tensors` to have the same type but found {type(tensors)} and {type(new_tensors)}."
137 if isinstance(tensors, (list, tuple)):
--> 138 return type(tensors)(nested_concat(t, n, padding_index=padding_index) for t, n in zip(tensors, new_tensors))
139 elif isinstance(tensors, torch.Tensor):
140 return torch_pad_and_concatenate(tensors, new_tensors, padding_index=padding_index)
File [~/miniconda3/lib/python3.11/site-packages/transformers/trainer_pt_utils.py:138](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/home/eyel/pm-ia-traitement-documents/src/python/notebooks/~/miniconda3/lib/python3.11/site-packages/transformers/trainer_pt_utils.py:138), in <genexpr>(.0)
134 assert type(tensors) == type(
135 new_tensors
136 ), f"Expected `tensors` and `new_tensors` to have the same type but found {type(tensors)} and {type(new_tensors)}."
137 if isinstance(tensors, (list, tuple)):
--> 138 return type(tensors)(nested_concat(t, n, padding_index=padding_index) for t, n in zip(tensors, new_tensors))
139 elif isinstance(tensors, torch.Tensor):
140 return torch_pad_and_concatenate(tensors, new_tensors, padding_index=padding_index)
File [~/miniconda3/lib/python3.11/site-packages/transformers/trainer_pt_utils.py:138](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/home/eyel/pm-ia-traitement-documents/src/python/notebooks/~/miniconda3/lib/python3.11/site-packages/transformers/trainer_pt_utils.py:138), in nested_concat(tensors, new_tensors, padding_index)
134 assert type(tensors) == type(
135 new_tensors
136 ), f"Expected `tensors` and `new_tensors` to have the same type but found {type(tensors)} and {type(new_tensors)}."
137 if isinstance(tensors, (list, tuple)):
--> 138 return type(tensors)(nested_concat(t, n, padding_index=padding_index) for t, n in zip(tensors, new_tensors))
139 elif isinstance(tensors, torch.Tensor):
140 return torch_pad_and_concatenate(tensors, new_tensors, padding_index=padding_index)
File [~/miniconda3/lib/python3.11/site-packages/transformers/trainer_pt_utils.py:138](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/home/eyel/pm-ia-traitement-documents/src/python/notebooks/~/miniconda3/lib/python3.11/site-packages/transformers/trainer_pt_utils.py:138), in <genexpr>(.0)
134 assert type(tensors) == type(
135 new_tensors
136 ), f"Expected `tensors` and `new_tensors` to have the same type but found {type(tensors)} and {type(new_tensors)}."
137 if isinstance(tensors, (list, tuple)):
--> 138 return type(tensors)(nested_concat(t, n, padding_index=padding_index) for t, n in zip(tensors, new_tensors))
139 elif isinstance(tensors, torch.Tensor):
140 return torch_pad_and_concatenate(tensors, new_tensors, padding_index=padding_index)
File [~/miniconda3/lib/python3.11/site-packages/transformers/trainer_pt_utils.py:138](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/home/eyel/pm-ia-traitement-documents/src/python/notebooks/~/miniconda3/lib/python3.11/site-packages/transformers/trainer_pt_utils.py:138), in nested_concat(tensors, new_tensors, padding_index)
134 assert type(tensors) == type(
135 new_tensors
136 ), f"Expected `tensors` and `new_tensors` to have the same type but found {type(tensors)} and {type(new_tensors)}."
137 if isinstance(tensors, (list, tuple)):
--> 138 return type(tensors)(nested_concat(t, n, padding_index=padding_index) for t, n in zip(tensors, new_tensors))
139 elif isinstance(tensors, torch.Tensor):
140 return torch_pad_and_concatenate(tensors, new_tensors, padding_index=padding_index)
File [~/miniconda3/lib/python3.11/site-packages/transformers/trainer_pt_utils.py:138](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/home/eyel/pm-ia-traitement-documents/src/python/notebooks/~/miniconda3/lib/python3.11/site-packages/transformers/trainer_pt_utils.py:138), in <genexpr>(.0)
134 assert type(tensors) == type(
135 new_tensors
136 ), f"Expected `tensors` and `new_tensors` to have the same type but found {type(tensors)} and {type(new_tensors)}."
137 if isinstance(tensors, (list, tuple)):
--> 138 return type(tensors)(nested_concat(t, n, padding_index=padding_index) for t, n in zip(tensors, new_tensors))
139 elif isinstance(tensors, torch.Tensor):
140 return torch_pad_and_concatenate(tensors, new_tensors, padding_index=padding_index)
File [~/miniconda3/lib/python3.11/site-packages/transformers/trainer_pt_utils.py:140](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/home/eyel/pm-ia-traitement-documents/src/python/notebooks/~/miniconda3/lib/python3.11/site-packages/transformers/trainer_pt_utils.py:140), in nested_concat(tensors, new_tensors, padding_index)
138 return type(tensors)(nested_concat(t, n, padding_index=padding_index) for t, n in zip(tensors, new_tensors))
139 elif isinstance(tensors, torch.Tensor):
--> 140 return torch_pad_and_concatenate(tensors, new_tensors, padding_index=padding_index)
141 elif isinstance(tensors, Mapping):
142 return type(tensors)(
143 {k: nested_concat(t, new_tensors[k], padding_index=padding_index) for k, t in tensors.items()}
144 )
File [~/miniconda3/lib/python3.11/site-packages/transformers/trainer_pt_utils.py:99](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/home/eyel/pm-ia-traitement-documents/src/python/notebooks/~/miniconda3/lib/python3.11/site-packages/transformers/trainer_pt_utils.py:99), in torch_pad_and_concatenate(tensor1, tensor2, padding_index)
96 tensor2 = atleast_1d(tensor2)
98 if len(tensor1.shape) == 1 or tensor1.shape[1] == tensor2.shape[1]:
---> 99 return torch.cat((tensor1, tensor2), dim=0)
101 # Let's figure out the new shape
102 new_shape = (tensor1.shape[0] + tensor2.shape[0], max(tensor1.shape[1], tensor2.shape[1])) + tensor1.shape[2:]
RuntimeError: Sizes of tensors must match except in dimension 0. Expected size 119 but got size 99 for tensor number 1 in the list.
Yes this is due to batches having different lengths of input_ids (in the code snippet of your first message, you set padding=True which means dynamic padding, each batch may have a different length). If your eval batch size is smaller than or equal to your training batch size, then it's fine.
It can be fixed by either padding all examples to the same length (i.e. using padding="max_length", max_length=200, truncation=True for instance), or by passing the flag eval_do_concat_batches=False to the TrainingArguments). In the latter case, you'll get a list of predictions/labels in the compute_metrics function rather than stacked tensors, so you would need to adapt your compute_metrics function accordingly.
I had the same error and fixed it by using
model.config.use_cache=Falseduring training. But @VictorSanh might know a better option
I don't have a better fix!
I think the cache problem should be fixed by converting DynamicCache back to legacy_cache in Idefics2's backbone language model, like it's already done in llama.
These changes are partially related to issue of making language models "compile" compatible, and should be available soon 🤗
Thanks for the explanation @zucchini-nlp! Does this mean that this fix won't be needed soon, or that it enables something which isn't available yet but will be soon?
We discussed this with @gante the cache input-output format yesterday. Maybe llama-format cache is not what we need, by anyway @gante will take care of it 😄
@zucchini-nlp OK. The main thing to know is what, if anything, should be updated in idefics2. Is what @gante is doing addressing this?
@amyeroberts I am not sure what should be the correct format of cache objects we return for language models since now we do not have consistency, so I wanted @gante to look at it.
There are two options for this:
- The language model should always return a tuple type cache (as current Llama), in which case we would have to only update Mistral to follow the same logic
- The language model should return the same type of cache as it received in forward. In that case Idefics2 has to add
cache.to_legacy_cache()in the end by ensuring it returns a tuple type, which will be consistent with how caching works for most current language models.
Also I believe we are going to get rid of the tuple type cache sometime in the future, so cache+Trainer is something to have in mind for then
@zucchini-nlp OK, great, thanks for explaining. Let's leave as-is and then once the cache format is standardized we can propogate this to idefics2 + other models.
Hi @EloiEynard I just uploaded an example notebook for fine-tuning Idefics2 on an image -> JSON dataset here: https://github.com/NielsRogge/Transformers-Tutorials/blob/master/Idefics2/Fine_tune_Idefics2_for_JSON_extraction_use_cases_(PyTorch_Lightning).ipynb
Thanks @NielsRogge, I got it all figured out with the Trainer and am currently finetuning with my custom eval. Wish I knew about lightning earlier though, seems more explicit.
By the way, if you don't mind me asking, I've noticed in your notebooks you use
model.add_adapter(lora_config)
model.enable_adapters()
Where I mostly used to see model = get_peft_model(model, lora_config)
Is there any difference between these two ? Thanks
I had the same question, turns out both are equivalent. The get_peft_model API is recommended as it returns a PeftModel which has additionally utility methods such as save_adapter() with support for saving resized embedding layers. I tried leveraging it, but for some reason I gave me out-of-memory errors which I did not encounter with add_adapter. This could be due to PyTorch Lightning, the fact that I was using a notebook, or something else.
I'm currently looking into creating a similar notebook that leverages the Trainer API with get_peft_model. The reason I used PyTorch Lightning is because it allowed me to get up and running very quickly, especially regarding computing metrics during evaluation.
I see, thanks for the details !