transformers
transformers copied to clipboard
AttributeError: 'DataParallel' object has no attribute 'device' when trying the Lora-for-sequence-classification-example
System Info
transformersversion: 4.39.3- Platform: Windows-10-10.0.19045-SP0
- Python version: 3.8.12
- Huggingface_hub version: 0.20.1
- Safetensors version: 0.4.1
- Accelerate version: 0.29.2
- Accelerate config: not found
- PyTorch version (GPU?): 2.2.2+cu121 (True)
I was trying the example of Lora for sequence classification, here , and when I tried to train(), I got the following error:
roberta_trainer.train()
0%| | 0/1905 [00:00<?, ?it/s]Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Users\User1\Anaconda3\envs\huggingface\lib\site-packages\transformers\trainer.py", line 1780, in train
return inner_training_loop(
File "C:\Users\User1\Anaconda3\envs\huggingface\lib\site-packages\transformers\trainer.py", line 2118, in _inner_training_loop
tr_loss_step = self.training_step(model, inputs)
File "C:\Users\User1\Anaconda3\envs\huggingface\lib\site-packages\transformers\trainer.py", line 3036, in training_step
loss = self.compute_loss(model, inputs)
File "<stdin>", line 8, in compute_loss
File "C:\Users\User1\Anaconda3\envs\huggingface\lib\site-packages\torch\nn\modules\module.py", line 1688, in __getattr__
raise AttributeError(f"'{type(self).__name__}' object has no attribute '{name}'")
AttributeError: 'DataParallel' object has no attribute 'device'
I wonder how this problem can be fixed. Thanks!
Who can help?
No response
Information
- [ ] The official example scripts
- [ ] My own modified scripts
Tasks
- [ ] An officially supported task in the
examplesfolder (such as GLUE/SQuAD, ...) - [ ] My own task or dataset (give details below)
Reproduction
I literally executed the code in the example without any modifications:
Lora-for-sequence-classification-with-Roberta-Llama-Mistral
Expected behavior
I expected finetuning to begin?
cc @pacman100
cc @younesbelkada
Hi @mrxiaohe Can you share the full traceback of the error? Also do you still get the same error when all HF-related repos are updated?
pip install -U transformers accelerate peft
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.
@younesbelkada , I have the same problem as mrxiaohe had, even after updating the packages.
/home/makrai/tool/python/astro/lib/python3.12/site-packages/torch/nn/parallel/parallel_apply.py:79: FutureWarning: `torch.cuda.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cuda', args...)` instead.
with torch.cuda.device(device), torch.cuda.stream(stream), autocast(enabled=autocast_enabled):
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
Cell In[32], line 1
----> 1 roberta_trainer.train()
File ~/tool/python/astro/lib/python3.12/site-packages/transformers/trainer.py:1938, in Trainer.train(self, resume_from_checkpoint, trial, ignore_keys_for_eval, **kwargs)
1936 hf_hub_utils.enable_progress_bars()
1937 else:
-> 1938 return inner_training_loop(
1939 args=args,
1940 resume_from_checkpoint=resume_from_checkpoint,
1941 trial=trial,
1942 ignore_keys_for_eval=ignore_keys_for_eval,
1943 )
File ~/tool/python/astro/lib/python3.12/site-packages/transformers/trainer.py:2279, in Trainer._inner_training_loop(self, batch_size, args, resume_from_checkpoint, trial, ignore_keys_for_eval)
2276 self.control = self.callback_handler.on_step_begin(args, self.state, self.control)
2278 with self.accelerator.accumulate(model):
-> 2279 tr_loss_step = self.training_step(model, inputs)
2281 if (
2282 args.logging_nan_inf_filter
2283 and not is_torch_xla_available()
2284 and (torch.isnan(tr_loss_step) or torch.isinf(tr_loss_step))
2285 ):
2286 # if loss is nan or inf simply add the average of previous logged losses
2287 tr_loss += tr_loss / (1 + self.state.global_step - self._globalstep_last_logged)
File ~/tool/python/astro/lib/python3.12/site-packages/transformers/trainer.py:3318, in Trainer.training_step(self, model, inputs)
3315 return loss_mb.reduce_mean().detach().to(self.args.device)
3317 with self.compute_loss_context_manager():
-> 3318 loss = self.compute_loss(model, inputs)
3320 del inputs
3321 if (
3322 self.args.torch_empty_cache_steps is not None
3323 and self.state.global_step % self.args.torch_empty_cache_steps == 0
3324 ):
Cell In[19], line 9, in WeightedCELossTrainer.compute_loss(self, model, inputs, return_outputs)
7 logits = outputs.get("logits")
8 # Compute custom loss
----> 9 loss_fct = torch.nn.CrossEntropyLoss(weight=torch.tensor([neg_weights, pos_weights], device=model.device, dtype=logits.dtype))
10 loss = loss_fct(logits.view(-1, self.model.config.num_labels), labels.view(-1))
11 return (loss, outputs) if return_outputs else loss
File ~/tool/python/astro/lib/python3.12/site-packages/torch/nn/modules/module.py:1729, in Module.__getattr__(self, name)
1727 if name in modules:
1728 return modules[name]
-> 1729 raise AttributeError(f"'{type(self).__name__}' object has no attribute '{name}'")
AttributeError: 'DataParallel' object has no attribute 'device'
Possibly @BenjaminBossan regarding lora/peft?
Sorry for the late reply.
@makrai I think the blog post contains an error in this line:
loss_fct = torch.nn.CrossEntropyLoss(weight=torch.tensor([neg_weights, pos_weights], device=model.device, dtype=logits.dtype))
When having multiple GPUs available, transformers Trainer will automatically switch to DataParallel (which has bitten me in the past too). But DataParallel does not expose the .device attribute, resulting in the error you showed. When visiting the repo, however, you can see that the code is changed, using labels.device instead of model.device. This change should fix your issue. Otherwise, consider using a single GPU.
All of this should be independent of PEFT usage.
Thanks, @BenjaminBossan , it seems to work!
Is there a chance that the same issue happens whenever an iteration is skipped because of an exception? It happens to me here: https://github.com/urchade/GLiNER/blob/a865555af53920321c4593f7c091dfc9059e9028/gliner/training/trainer.py#L84