diffusers
diffusers copied to clipboard
Attempting to unscale FP16 gradients
Describe the bug
The script wouldn't start the training steps due to the error in the title
Reproduction
No response
Logs
Steps: 0%| | 0/800 [00:00<?, ?it/s]Traceback (most recent call last):
File "/workspace/sdw/examples/dreambooth/train_dreambooth.py", line 812, in <module>
main(args)
File "/workspace/sdw/examples/dreambooth/train_dreambooth.py", line 784, in main
optimizer.step()
File "/opt/conda/lib/python3.9/site-packages/accelerate/optimizer.py", line 134, in step
self.scaler.step(self.optimizer, closure)
File "/opt/conda/lib/python3.9/site-packages/torch/cuda/amp/grad_scaler.py", line 337, in step
self.unscale_(optimizer)
File "/opt/conda/lib/python3.9/site-packages/torch/cuda/amp/grad_scaler.py", line 282, in unscale_
optimizer_state["found_inf_per_device"] = self._unscale_grads_(optimizer, inv_scale, found_inf, False)
File "/opt/conda/lib/python3.9/site-packages/torch/cuda/amp/grad_scaler.py", line 210, in _unscale_grads_
raise ValueError("Attempting to unscale FP16 gradients.")
ValueError: Attempting to unscale FP16 gradients.
System Info
my pip list: absl-py 1.3.0 accelerate 0.14.0 aiohttp 3.8.3 aiosignal 1.2.0 anyio 3.6.2 argon2-cffi 21.3.0 argon2-cffi-bindings 21.2.0 asttokens 2.0.5 astunparse 1.6.3 async-timeout 4.0.2 attrs 22.1.0 awscli 1.27.8 Babel 2.11.0 backcall 0.2.0 bash_kernel 0.8.0 bcrypt 4.0.1 beautifulsoup4 4.11.1 bitsandbytes 0.35.4 bleach 5.0.1 botocore 1.29.8 brotlipy 0.7.0 cachetools 5.2.0 certifi 2022.9.24 cffi 1.15.0 chardet 4.0.0 charset-normalizer 2.0.4 click 8.1.3 cmake 3.24.3 colorama 0.4.4 conda 22.9.0 conda-build 3.22.0 conda-content-trust 0+unknown conda-package-handling 1.8.1 contourpy 1.0.6 cryptography 36.0.0 cycler 0.11.0 debugpy 1.6.3 decorator 5.1.1 defusedxml 0.7.1 diffusers 0.8.0.dev0 docutils 0.16 entrypoints 0.4 exceptiongroup 1.0.0 executing 0.8.3 expecttest 0.1.4 fastapi 0.86.0 fastjsonschema 2.16.2 ffmpy 0.3.0 filelock 3.6.0 fonttools 4.38.0 frozenlist 1.3.1 fsspec 2022.10.0 ftfy 6.1.1 future 0.18.2 glob2 0.7 google-auth 2.14.1 google-auth-oauthlib 0.4.6 gradio 3.9 grpcio 1.50.0 h11 0.12.0 httpcore 0.15.0 httpx 0.23.0 huggingface-hub 0.10.1 hypothesis 6.56.4 idna 3.3 importlib-metadata 5.0.0 iniconfig 1.1.1 ipykernel 6.17.1 ipython 8.4.0 ipython-genutils 0.2.0 ipywidgets 8.0.2 jedi 0.18.1 Jinja2 3.1.2 jmespath 1.0.1 json5 0.9.10 jsonschema 4.17.0 jupyter 1.0.0 jupyter-archive 3.3.2 jupyter_client 7.4.5 jupyter-console 6.4.4 jupyter_core 5.0.0 jupyter-http-over-ws 0.0.8 jupyter-server 1.23.2 jupyterlab 3.5.0 jupyterlab-pygments 0.2.2 jupyterlab_server 2.16.3 jupyterlab-widgets 3.0.3 kiwisolver 1.4.4 libarchive-c 2.9 linkify-it-py 1.0.3 Markdown 3.4.1 markdown-it-py 2.1.0 MarkupSafe 2.1.1 matplotlib 3.6.2 matplotlib-inline 0.1.6 mdit-py-plugins 0.3.1 mdurl 0.1.2 mistune 2.0.4 mkl-fft 1.3.1 mkl-random 1.2.2 mkl-service 2.4.0 modelcards 0.1.6 mpmath 1.2.1 multidict 6.0.2 mypy-extensions 0.4.3 natsort 8.2.0 nbclassic 0.4.8 nbclient 0.7.0 nbconvert 7.2.5 nbformat 5.7.0 nbzip 0.1.0 nest-asyncio 1.5.6 notebook 6.5.2 notebook_shim 0.2.2 numpy 1.22.3 oauthlib 3.2.2 orjson 3.8.1 packaging 21.3 pandas 1.5.1 pandocfilters 1.5.0 paramiko 2.12.0 parso 0.8.3 pexpect 4.8.0 pickleshare 0.7.5 Pillow 9.0.1 pip 21.2.4 pkginfo 1.8.3 platformdirs 2.5.4 pluggy 1.0.0 prometheus-client 0.15.0 prompt-toolkit 3.0.20 protobuf 3.20.3 psutil 5.8.0 ptyprocess 0.7.0 pure-eval 0.2.2 pyasn1 0.4.8 pyasn1-modules 0.2.8 pycosat 0.6.3 pycparser 2.21 pycryptodome 3.15.0 pydantic 1.10.2 pydub 0.25.1 Pygments 2.11.2 PyNaCl 1.5.0 pyOpenSSL 22.0.0 pyparsing 3.0.9 pyre-extensions 0.0.23 pyrsistent 0.19.2 PySocks 1.7.1 pytest 7.2.0 python-dateutil 2.8.2 python-multipart 0.0.5 pytz 2022.1 PyYAML 5.4.1 pyzmq 24.0.1 qtconsole 5.4.0 QtPy 2.3.0 regex 2022.10.31 requests 2.27.1 requests-oauthlib 1.3.1 rfc3986 1.5.0 rsa 4.7.2 ruamel-yaml-conda 0.15.100 s3transfer 0.6.0 Send2Trash 1.8.0 setuptools 61.2.0 six 1.16.0 sniffio 1.3.0 sortedcontainers 2.4.0 soupsieve 2.3.2.post1 stack-data 0.2.0 starlette 0.20.4 sympy 1.11.1 tensorboard 2.11.0 tensorboard-data-server 0.6.1 tensorboard-plugin-wit 1.8.1 terminado 0.17.0 tinycss2 1.2.1 tokenizers 0.13.1 toml 0.10.2 tomli 2.0.1 toolz 0.11.2 torch 1.13.0 torchtext 0.14.0 torchvision 0.14.0 tornado 6.2 tqdm 4.63.0 traitlets 5.5.0 transformers 4.24.0 triton 2.0.0.dev20221105 types-dataclasses 0.6.6 typing_extensions 4.4.0 typing-inspect 0.8.0 uc-micro-py 1.0.1 urllib3 1.26.8 uvicorn 0.19.0 wcwidth 0.2.5 webencodings 0.5.1 websocket-client 1.4.2 websockets 10.4 Werkzeug 2.2.2 wheel 0.37.1 widgetsnbextension 4.0.3 xformers 0.0.14.dev0 yarl 1.8.1 zipp 3.10.0
I've tried in vast ai with these machines: RTX 3090 CUDA 11.4
A6000 CUDA 11.7
-
diffusers
version: 0.8.0.dev0 - Platform: Linux-5.4.0-81-generic-x86_64-with-glibc2.27
- Python version: 3.9.12
- PyTorch version (GPU?): 1.13.0 (True)
- Huggingface_hub version: 0.10.1
- Transformers version: 4.24.0
- Using GPU in script?: RTS 3090/A6000 in vast
- Using distributed or parallel set-up in script?: NO
According to the user report at https://github.com/huggingface/diffusers/issues/1246 it's a recently introduced bug in diffusers.