diffusers icon indicating copy to clipboard operation
diffusers copied to clipboard

Unable to correctly install dependencies for Dreambooth example on GCP user or managed notebooks

Open DevonPeroutky opened this issue 2 years ago • 3 comments

Describe the bug

I am attempting to simply run the Dreambooth training example on a GCP Vertex AI workbench notebook. I have tried their managed notebook and user-managed notebooks with the same issue. However, I can not seem to get the dependencies to align correctly.

I installed the dependencies, as instructed via:

!pip install git+https://github.com/huggingface/diffusers
!pip install -U -r diffusers/examples/dreambooth/requirements.txt

However, when I attempt to initialize an Accelerator environment, I get the following error:

!accelerate env
Traceback (most recent call last):
  File "/opt/conda/bin/accelerate", line 5, in <module>
    from accelerate.commands.accelerate_cli import main
  File "/opt/conda/lib/python3.7/site-packages/accelerate/__init__.py", line 7, in <module>
    from .accelerator import Accelerator
  File "/opt/conda/lib/python3.7/site-packages/accelerate/accelerator.py", line 25, in <module>
    import torch
  File "/home/jupyter/.local/lib/python3.7/site-packages/torch/__init__.py", line 191, in <module>
    _load_global_deps()
  File "/home/jupyter/.local/lib/python3.7/site-packages/torch/__init__.py", line 153, in _load_global_deps
    ctypes.CDLL(lib_path, mode=ctypes.RTLD_GLOBAL)
  File "/opt/conda/lib/python3.7/ctypes/__init__.py", line 364, in __init__
    self._handle = _dlopen(self._name, mode)
OSError: /home/jupyter/.local/lib/python3.7/site-packages/torch/lib/../../nvidia/cublas/lib/libcublas.so.11: undefined symbol: cublasLtGetStatusString, version libcublasLt.so.11

My environment looks like the following (from pip freeze):

Python: 3.7
---------------
torch==1.13.0
diffusers @ git+https://github.com/huggingface/diffusers@8171566163f0b197282786bf39de95c130eb5fa0
accelerate==0.14.0
torchvision==0.14.0
transformers>=4.21.0
ftfy==6.1.1
tensorboard==2.11.0
modelcards==0.1.6

This seems like a version compatibility issue between accelerate and pytorch, but I'm not sure the best way to go about resolving. I tried downgrading Pytorch to 1.9.0 at the suggestion of this StackOverflow with no luck.

Reproduction

No response

Logs

Traceback (most recent call last):
  File "/opt/conda/bin/accelerate", line 5, in <module>
    from accelerate.commands.accelerate_cli import main
  File "/opt/conda/lib/python3.7/site-packages/accelerate/__init__.py", line 7, in <module>
    from .accelerator import Accelerator
  File "/opt/conda/lib/python3.7/site-packages/accelerate/accelerator.py", line 25, in <module>
    import torch
  File "/home/jupyter/.local/lib/python3.7/site-packages/torch/__init__.py", line 191, in <module>
    _load_global_deps()
  File "/home/jupyter/.local/lib/python3.7/site-packages/torch/__init__.py", line 153, in _load_global_deps
    ctypes.CDLL(lib_path, mode=ctypes.RTLD_GLOBAL)
  File "/opt/conda/lib/python3.7/ctypes/__init__.py", line 364, in __init__
    self._handle = _dlopen(self._name, mode)
OSError: /home/jupyter/.local/lib/python3.7/site-packages/torch/lib/../../nvidia/cublas/lib/libcublas.so.11: undefined symbol: cublasLtGetStatusString, version libcublasLt.so.11

System Info

!diffusers-cli env
Traceback (most recent call last):
  File "/opt/conda/bin/diffusers-cli", line 5, in <module>
    from diffusers.commands.diffusers_cli import main
  File "/opt/conda/lib/python3.7/site-packages/diffusers/__init__.py", line 1, in <module>
    from .utils import (
  File "/opt/conda/lib/python3.7/site-packages/diffusers/utils/__init__.py", line 44, in <module>
    from .testing_utils import (
  File "/opt/conda/lib/python3.7/site-packages/diffusers/utils/testing_utils.py", line 27, in <module>
    import torch
  File "/home/jupyter/.local/lib/python3.7/site-packages/torch/__init__.py", line 191, in <module>
    _load_global_deps()
  File "/home/jupyter/.local/lib/python3.7/site-packages/torch/__init__.py", line 153, in _load_global_deps
    ctypes.CDLL(lib_path, mode=ctypes.RTLD_GLOBAL)
  File "/opt/conda/lib/python3.7/ctypes/__init__.py", line 364, in __init__
    self._handle = _dlopen(self._name, mode)
OSError: /home/jupyter/.local/lib/python3.7/site-packages/torch/lib/../../nvidia/cublas/lib/libcublas.so.11: undefined symbol: cublasLtGetStatusString, version libcublasLt.so.11

DevonPeroutky avatar Nov 13 '22 21:11 DevonPeroutky

Hey @DevonPeroutky there seems to be a problem with your torch install. Can you try to just import Pytorch?

import torch

print(torch.__version__)

patrickvonplaten avatar Nov 17 '22 15:11 patrickvonplaten

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

github-actions[bot] avatar Dec 14 '22 15:12 github-actions[bot]

I seem to have similar error, my issue: https://github.com/huggingface/diffusers/issues/1750

balintdecsi avatar Dec 18 '22 20:12 balintdecsi

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

github-actions[bot] avatar Jan 12 '23 15:01 github-actions[bot]

I met the same error. befor I installed xformers, my code run stably. but after I installed xformers, I met the same error. https://huggingface.co/docs/diffusers/optimization/fp16#memory-efficient-attention i just follow this guide to speed

leijuzi avatar Mar 13 '23 08:03 leijuzi

Same error on GCP, Pytorch version : 1.13.1

StateGovernment avatar Mar 17 '23 06:03 StateGovernment