qlora icon indicating copy to clipboard operation
qlora copied to clipboard

RuntimeError: unscale_() has already been called on this optimizer since the last update().

Open opyate opened this issue 1 year ago • 5 comments

In the tutorial section in the README: https://github.com/artidoro/qlora#tutorials-and-demonstrations It mentions this fine-tuning notebook: https://colab.research.google.com/drive/1VoYNfYDKcKRQRor98Zbf2-9VQTtGJ24k?usp=sharing

When I run it, I see this error:

RuntimeError: unscale_() has already been called on this optimizer since the last update().

Full trace:

in <cell line: 23>:23                                                                            │
│                                                                                                  │
│ /usr/local/lib/python3.10/dist-packages/transformers/trainer.py:1661 in train                    │
│                                                                                                  │
│   1658 │   │   inner_training_loop = find_executable_batch_size(                                 │
│   1659 │   │   │   self._inner_training_loop, self._train_batch_size, args.auto_find_batch_size  │
│   1660 │   │   )                                                                                 │
│ ❱ 1661 │   │   return inner_training_loop(                                                       │
│   1662 │   │   │   args=args,                                                                    │
│   1663 │   │   │   resume_from_checkpoint=resume_from_checkpoint,                                │
│   1664 │   │   │   trial=trial,                                                                  │
│                                                                                                  │
│ /usr/local/lib/python3.10/dist-packages/transformers/trainer.py:2008 in _inner_training_loop     │
│                                                                                                  │
│   2005 │   │   │   │   │   │   │   │   args.max_grad_norm,                                       │
│   2006 │   │   │   │   │   │   │   )                                                             │
│   2007 │   │   │   │   │   │   else:                                                             │
│ ❱ 2008 │   │   │   │   │   │   │   self.accelerator.clip_grad_norm_(                             │
│   2009 │   │   │   │   │   │   │   │   model.parameters(),                                       │
│   2010 │   │   │   │   │   │   │   │   args.max_grad_norm,                                       │
│   2011 │   │   │   │   │   │   │   )                                                             │
│                                                                                                  │
│ /usr/local/lib/python3.10/dist-packages/accelerate/accelerator.py:1877 in clip_grad_norm_        │
│                                                                                                  │
│   1874 │   │   │   # `accelerator.backward(loss)` is doing that automatically. Therefore, its i  │
│   1875 │   │   │   # We cannot return the gradient norm because DeepSpeed does it.               │
│   1876 │   │   │   return None                                                                   │
│ ❱ 1877 │   │   self.unscale_gradients()                                                          │
│   1878 │   │   return torch.nn.utils.clip_grad_norm_(parameters, max_norm, norm_type=norm_type)  │
│   1879 │                                                                                         │
│   1880 │   def clip_grad_value_(self, parameters, clip_value):                                   │
│                                                                                                  │
│ /usr/local/lib/python3.10/dist-packages/accelerate/accelerator.py:1840 in unscale_gradients      │
│                                                                                                  │
│   1837 │   │   │   for opt in optimizer:                                                         │
│   1838 │   │   │   │   while isinstance(opt, AcceleratedOptimizer):                              │
│   1839 │   │   │   │   │   opt = opt.optimizer                                                   │
│ ❱ 1840 │   │   │   │   self.scaler.unscale_(opt)                                                 │
│   1841 │                                                                                         │
│   1842 │   def clip_grad_norm_(self, parameters, max_norm, norm_type=2):                         │
│   1843 │   │   """                                                                               │
│                                                                                                  │
│ /usr/local/lib/python3.10/dist-packages/torch/cuda/amp/grad_scaler.py:275 in unscale_            │
│                                                                                                  │
│   272 │   │   optimizer_state = self._per_optimizer_states[id(optimizer)]                        │
│   273 │   │                                                                                      │
│   274 │   │   if optimizer_state["stage"] is OptState.UNSCALED:                                  │
│ ❱ 275 │   │   │   raise RuntimeError("unscale_() has already been called on this optimizer sin   │
│   276 │   │   elif optimizer_state["stage"] is OptState.STEPPED:                                 │
│   277 │   │   │   raise RuntimeError("unscale_() is being called after step().")                 │
│   278                                                                                            │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
RuntimeError: unscale_() has already been called on this optimizer since the last update().

image

opyate avatar May 31 '23 12:05 opyate

I got the same error.

jaehyeongAN avatar May 31 '23 14:05 jaehyeongAN

same here

Chris2aa avatar May 31 '23 14:05 Chris2aa

same problem

zw-SIMM avatar May 31 '23 17:05 zw-SIMM

i found a solution here : https://github.com/huggingface/transformers/issues/23905. Just use : !pip install git+https://github.com/huggingface/transformers@de9255de27abfcae4a1f816b904915f0b1e23cd9 instead of : !pip install -q -U git+https://github.com/huggingface/transformers.git

Chris2aa avatar May 31 '23 18:05 Chris2aa

This should be fixed by https://github.com/huggingface/transformers/pull/23914 , if you re-install transformers from source the error should disappear

younesbelkada avatar May 31 '23 22:05 younesbelkada

This should be fixed by huggingface/transformers#23914 , if you re-install transformers from source the error should disappear

Thanks, the Colab ran without issues this time :)

opyate avatar Jun 05 '23 14:06 opyate

Wasted SO MUCH time!!!!

thusinh1969 avatar Jun 14 '23 13:06 thusinh1969