transformers icon indicating copy to clipboard operation
transformers copied to clipboard

AttributeError: 'DataParallel' object has no attribute 'device' when trying the Lora-for-sequence-classification-example

Open mrxiaohe opened this issue 1 year ago • 3 comments

System Info

  • transformers version: 4.39.3
  • Platform: Windows-10-10.0.19045-SP0
  • Python version: 3.8.12
  • Huggingface_hub version: 0.20.1
  • Safetensors version: 0.4.1
  • Accelerate version: 0.29.2
  • Accelerate config: not found
  • PyTorch version (GPU?): 2.2.2+cu121 (True)

I was trying the example of Lora for sequence classification, here , and when I tried to train(), I got the following error:

roberta_trainer.train()
  0%|                                                                                                                                          | 0/1905 [00:00<?, ?it/s]Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Users\User1\Anaconda3\envs\huggingface\lib\site-packages\transformers\trainer.py", line 1780, in train
    return inner_training_loop(
  File "C:\Users\User1\Anaconda3\envs\huggingface\lib\site-packages\transformers\trainer.py", line 2118, in _inner_training_loop
    tr_loss_step = self.training_step(model, inputs)
  File "C:\Users\User1\Anaconda3\envs\huggingface\lib\site-packages\transformers\trainer.py", line 3036, in training_step
    loss = self.compute_loss(model, inputs)
  File "<stdin>", line 8, in compute_loss
  File "C:\Users\User1\Anaconda3\envs\huggingface\lib\site-packages\torch\nn\modules\module.py", line 1688, in __getattr__
    raise AttributeError(f"'{type(self).__name__}' object has no attribute '{name}'")
AttributeError: 'DataParallel' object has no attribute 'device'

I wonder how this problem can be fixed. Thanks!

Who can help?

No response

Information

  • [ ] The official example scripts
  • [ ] My own modified scripts

Tasks

  • [ ] An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • [ ] My own task or dataset (give details below)

Reproduction

I literally executed the code in the example without any modifications:

Lora-for-sequence-classification-with-Roberta-Llama-Mistral

Expected behavior

I expected finetuning to begin?

mrxiaohe avatar Apr 24 '24 02:04 mrxiaohe

cc @pacman100

amyeroberts avatar Apr 24 '24 09:04 amyeroberts

cc @younesbelkada

amyeroberts avatar May 24 '24 09:05 amyeroberts

Hi @mrxiaohe Can you share the full traceback of the error? Also do you still get the same error when all HF-related repos are updated?

pip install -U transformers accelerate peft

younesbelkada avatar May 24 '24 12:05 younesbelkada

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

github-actions[bot] avatar Jun 18 '24 08:06 github-actions[bot]

@younesbelkada , I have the same problem as mrxiaohe had, even after updating the packages.

/home/makrai/tool/python/astro/lib/python3.12/site-packages/torch/nn/parallel/parallel_apply.py:79: FutureWarning: `torch.cuda.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cuda', args...)` instead.
  with torch.cuda.device(device), torch.cuda.stream(stream), autocast(enabled=autocast_enabled):

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Cell In[32], line 1
----> 1 roberta_trainer.train()

File ~/tool/python/astro/lib/python3.12/site-packages/transformers/trainer.py:1938, in Trainer.train(self, resume_from_checkpoint, trial, ignore_keys_for_eval, **kwargs)
   1936         hf_hub_utils.enable_progress_bars()
   1937 else:
-> 1938     return inner_training_loop(
   1939         args=args,
   1940         resume_from_checkpoint=resume_from_checkpoint,
   1941         trial=trial,
   1942         ignore_keys_for_eval=ignore_keys_for_eval,
   1943     )

File ~/tool/python/astro/lib/python3.12/site-packages/transformers/trainer.py:2279, in Trainer._inner_training_loop(self, batch_size, args, resume_from_checkpoint, trial, ignore_keys_for_eval)
   2276     self.control = self.callback_handler.on_step_begin(args, self.state, self.control)
   2278 with self.accelerator.accumulate(model):
-> 2279     tr_loss_step = self.training_step(model, inputs)
   2281 if (
   2282     args.logging_nan_inf_filter
   2283     and not is_torch_xla_available()
   2284     and (torch.isnan(tr_loss_step) or torch.isinf(tr_loss_step))
   2285 ):
   2286     # if loss is nan or inf simply add the average of previous logged losses
   2287     tr_loss += tr_loss / (1 + self.state.global_step - self._globalstep_last_logged)

File ~/tool/python/astro/lib/python3.12/site-packages/transformers/trainer.py:3318, in Trainer.training_step(self, model, inputs)
   3315     return loss_mb.reduce_mean().detach().to(self.args.device)
   3317 with self.compute_loss_context_manager():
-> 3318     loss = self.compute_loss(model, inputs)
   3320 del inputs
   3321 if (
   3322     self.args.torch_empty_cache_steps is not None
   3323     and self.state.global_step % self.args.torch_empty_cache_steps == 0
   3324 ):

Cell In[19], line 9, in WeightedCELossTrainer.compute_loss(self, model, inputs, return_outputs)
      7 logits = outputs.get("logits")
      8 # Compute custom loss
----> 9 loss_fct = torch.nn.CrossEntropyLoss(weight=torch.tensor([neg_weights, pos_weights], device=model.device, dtype=logits.dtype))
     10 loss = loss_fct(logits.view(-1, self.model.config.num_labels), labels.view(-1))
     11 return (loss, outputs) if return_outputs else loss

File ~/tool/python/astro/lib/python3.12/site-packages/torch/nn/modules/module.py:1729, in Module.__getattr__(self, name)
   1727     if name in modules:
   1728         return modules[name]
-> 1729 raise AttributeError(f"'{type(self).__name__}' object has no attribute '{name}'")

AttributeError: 'DataParallel' object has no attribute 'device'

makrai avatar Sep 02 '24 15:09 makrai

Possibly @BenjaminBossan regarding lora/peft?

amyeroberts avatar Sep 02 '24 15:09 amyeroberts

Sorry for the late reply.

@makrai I think the blog post contains an error in this line:

loss_fct = torch.nn.CrossEntropyLoss(weight=torch.tensor([neg_weights, pos_weights], device=model.device, dtype=logits.dtype))

When having multiple GPUs available, transformers Trainer will automatically switch to DataParallel (which has bitten me in the past too). But DataParallel does not expose the .device attribute, resulting in the error you showed. When visiting the repo, however, you can see that the code is changed, using labels.device instead of model.device. This change should fix your issue. Otherwise, consider using a single GPU.

All of this should be independent of PEFT usage.

BenjaminBossan avatar Sep 06 '24 13:09 BenjaminBossan

Thanks, @BenjaminBossan , it seems to work!

makrai avatar Sep 11 '24 13:09 makrai

Is there a chance that the same issue happens whenever an iteration is skipped because of an exception? It happens to me here: https://github.com/urchade/GLiNER/blob/a865555af53920321c4593f7c091dfc9059e9028/gliner/training/trainer.py#L84

Vomvas avatar Jan 24 '25 08:01 Vomvas