app.enfugue.ai icon indicating copy to clipboard operation
app.enfugue.ai copied to clipboard

Error in image generation: PytorchStreamReader failed reading zip archive: failed finding central directory

Open lennartbrandin opened this issue 1 year ago • 1 comments

Issue

I get the following error when trying to generate picture: RuntimeError: PytorchStreamReader failed reading zip archive: failed finding central directory

General Information

enfugue-server-0.2.0-manylinux-cuda-x86_64.tar.gz.* Archlinux - linux 6.4.3-zen1-2-zen AMD RX 5700

Log

tail -f ~/.cache/enfugue.log 
2023-08-07 19:38:26,515 [cherrypy.error] INFO (_cplogging.py:213) [07/Aug/2023:19:38:26] ENGINE Bus STARTING
2023-08-07 19:38:26,622 [cherrypy.error] INFO (_cplogging.py:213) [07/Aug/2023:19:38:26] ENGINE Serving on https://0.0.0.0:45554
2023-08-07 19:38:26,622 [cherrypy.error] INFO (_cplogging.py:213) [07/Aug/2023:19:38:26] ENGINE Bus STARTED
2023-08-07 19:38:28,690 [enfugue] ERROR (gpu.py:145) Couldn't execute nvidia-smi (binary `nvidia-smi`): [Errno 2] No such file or directory: 'nvidia-smi'

2023-08-07 19:38:28,696 [pibble] ERROR (__init__.py:232) Error handler raised exception DetachedInstanceError(Instance <AuthenticationTokenDeclarative at 0x7f20137d6d10> is not bound to a Session; attribute refresh operation cannot proceed (Background on this error at: https://sqlalche.me/e/14/bhk3))
2023-08-07 19:38:38,589 [enfugue] ERROR (gpu.py:145) Couldn't execute nvidia-smi (binary `nvidia-smi`): [Errno 2] No such file or directory: 'nvidia-smi'

2023-08-07 19:38:48,592 [enfugue] ERROR (gpu.py:145) Couldn't execute nvidia-smi (binary `nvidia-smi`): [Errno 2] No such file or directory: 'nvidia-smi'

2023-08-07 19:38:59,184 [enfugue] ERROR (gpu.py:145) Couldn't execute nvidia-smi (binary `nvidia-smi`): [Errno 2] No such file or directory: 'nvidia-smi'

2023-08-07 19:39:09,137 [enfugue] ERROR (gpu.py:145) Couldn't execute nvidia-smi (binary `nvidia-smi`): [Errno 2] No such file or directory: 'nvidia-smi'

2023-08-07 19:39:19,195 [enfugue] ERROR (gpu.py:145) Couldn't execute nvidia-smi (binary `nvidia-smi`): [Errno 2] No such file or directory: 'nvidia-smi'

2023-08-07 19:39:28,479 [enfugue] ERROR (gpu.py:145) Couldn't execute nvidia-smi (binary `nvidia-smi`): [Errno 2] No such file or directory: 'nvidia-smi'

2023-08-07 19:39:38,526 [enfugue] ERROR (gpu.py:145) Couldn't execute nvidia-smi (binary `nvidia-smi`): [Errno 2] No such file or directory: 'nvidia-smi'

2023-08-07 19:39:49,552 [enfugue] ERROR (gpu.py:145) Couldn't execute nvidia-smi (binary `nvidia-smi`): [Errno 2] No such file or directory: 'nvidia-smi'

2023-08-07 19:39:52,055 [enfugue] ERROR (engine.py:259) Traceback (most recent call last):
  File "enfugue/diffusion/process.py", line 360, in run
  File "enfugue/diffusion/process.py", line 112, in execute_diffusion_plan
  File "enfugue/diffusion/plan.py", line 698, in execute
  File "enfugue/diffusion/plan.py", line 911, in execute_nodes
  File "enfugue/diffusion/plan.py", line 542, in execute
  File "enfugue/diffusion/plan.py", line 443, in execute
  File "enfugue/diffusion/manager.py", line 2819, in __call__
  File "enfugue/diffusion/manager.py", line 2181, in pipeline
  File "enfugue/diffusion/pipeline.py", line 204, in from_ckpt
  File "torch/serialization.py", line 995, in load
    with _open_zipfile_reader(opened_file) as opened_zipfile:
  File "torch/serialization.py", line 449, in __init__
    super().__init__(torch._C.PyTorchFileReader(name_or_buffer))
RuntimeError: PytorchStreamReader failed reading zip archive: failed finding central directory

2023-08-07 19:39:59,601 [enfugue] ERROR (gpu.py:145) Couldn't execute nvidia-smi (binary `nvidia-smi`): [Errno 2] No such file or directory: 'nvidia-smi'

lennartbrandin avatar Aug 07 '23 17:08 lennartbrandin

Hi @lennartbrandin, so glad you're back!

First, this specific error message looks like the .ckpt file it's trying to download may be broken. If you're using the default model, check your downloaded version at ~/.cache/enfugue/checkpoint/v1-5-pruned.ckpt and make sure it's md5sum is fde08ee6f4fac7ab26592bf519cbb405. If not, delete it, and Enfugue will download it again.

Next, I can tell by your logs already that it's unlikely Enfugue is talking to your GPU. I did some work on AMD GPU compatibility, but I only have V520's available to me, and those are technically unsupported - so it's been difficult to nail down consistent generic steps for AMD users.

The fact that it's trying to use nvidia-smi tells me that it can't find rocm-smi, which is AMD's equivalent. Have you installed any ROCm drivers? I didn't mention this anywhere in my install instructions - if you've managed to get Stable Diffusion to work with some other UI, then you probably already have done this, but if not, it's an essential part of getting AMD devices to work AI/ML in general. Here is some documentation on that, the TL;DR is you need to run amdgpu-install --usecase=rocm. You'll also need to make sure whatever user you're running Enfugue as is a member of the video and render groups, and restart your machine after modifying group memberships (for some reason it doesn't seem to update otherwise.)

I'm sorry to say I have no idea what version(s) of ROCm will work with your hardware, there are many variables in play with architecture and LLVM targets that is totally unfamiliar to me, and it's entirely possible that no combination will work - at least with the build you have.

If you're willing to go the extra mile, your best bet for compatibility is going to be starting from source, and working with versions of Torch+ROCm until something sticks - this is how I was trying to get the V520 to work:

  1. Install Miniconda (Arch Link)
  2. Download this environment file anywhere on your machine.
  3. Run conda env create -f <file_downloaded_above>. This will take 5-15 minutes to install about ~3G of dependencies.
  4. Run conda activate enfugue to set up your shell to use the conda-installed binaries and libraries.
  5. Run pip install enfugue==0.2.0 --no-deps to get Enfugue and not try to install any other packages.

You should now have an Enfugue installation with no API. If you simply run enfugue version this should be the output:

$ enfugue version
Enfugue v.0.2.0
Torch v.2.1.0.dev20230803+cpu

AI/ML Capabilities:
---------------------
Device type: cpu
CUDA: Unavailable
DirectML: Unavailable
MPS: Unavailable

Now what we want to do is install the version of Torch that matches the version of ROCm installed earlier. You can get the version like this:

$ apt show rocm-libs -a
Package: rocm-libs
Version: 5.6.0.50600-67~22.04
...

Install the torch version that matches this like so. Ignore any errors.

$ pip install torch==2.1.0.dev20230803+rocm5.6 --extra-index-url https://download.pytorch.org/whl/nightly
Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com, https://download.pytorch.org/whl/nightly
Collecting torch==2.1.0.dev20230803+rocm5.6
  Downloading https://download.pytorch.org/whl/nightly/rocm5.6/torch-2.1.0.dev20230803%2Brocm5.6-cp310-cp310-linux_x86_64.whl (1590.7 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.6/1.6 GB 285.5 MB/s eta 0:00:00
Requirement already satisfied: filelock in /home/ubuntu/miniconda3/envs/enfugue-test/lib/python3.10/site-packages (from torch==2.1.0.dev20230803+rocm5.6) (3.12.2)
Requirement already satisfied: typing-extensions in /home/ubuntu/miniconda3/envs/enfugue-test/lib/python3.10/site-packages (from torch==2.1.0.dev20230803+rocm5.6) (4.7.1)
Requirement already satisfied: sympy in /home/ubuntu/miniconda3/envs/enfugue-test/lib/python3.10/site-packages (from torch==2.1.0.dev20230803+rocm5.6) (1.12)
Requirement already satisfied: networkx in /home/ubuntu/miniconda3/envs/enfugue-test/lib/python3.10/site-packages (from torch==2.1.0.dev20230803+rocm5.6) (3.1)
Requirement already satisfied: jinja2 in /home/ubuntu/miniconda3/envs/enfugue-test/lib/python3.10/site-packages (from torch==2.1.0.dev20230803+rocm5.6) (3.1.2)
Requirement already satisfied: fsspec in /home/ubuntu/miniconda3/envs/enfugue-test/lib/python3.10/site-packages (from torch==2.1.0.dev20230803+rocm5.6) (2023.6.0)
Requirement already satisfied: MarkupSafe>=2.0 in /home/ubuntu/miniconda3/envs/enfugue-test/lib/python3.10/site-packages (from jinja2->torch==2.1.0.dev20230803+rocm5.6) (2.1.3)
Requirement already satisfied: mpmath>=0.19 in /home/ubuntu/miniconda3/envs/enfugue-test/lib/python3.10/site-packages (from sympy->torch==2.1.0.dev20230803+rocm5.6) (1.3.0)
Installing collected packages: torch
  Attempting uninstall: torch
    Found existing installation: torch 2.1.0.dev20230803+cpu
    Uninstalling torch-2.1.0.dev20230803+cpu:
      Successfully uninstalled torch-2.1.0.dev20230803+cpu
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
enfugue 0.2.0 requires diffusers<0.19,>=0.18, but you have diffusers 0.20.0.dev0 which is incompatible.
enfugue 0.2.0 requires einops<0.4,>=0.3, but you have einops 0.6.1 which is incompatible.
enfugue 0.2.0 requires invisible-watermark<0.2,>=0.1, but you have invisible-watermark 0.2.0 which is incompatible.

You can find all the versions available by going to https://download.pytorch.org/whl/nightly/torch/ in a browser. After you've installed this, and everything has been set up correctly, the result of enfugue version should change, indicating that CUDA is now available.

$ enfugue version
Failed to parse CPUID
Enfugue v.0.2.0
Torch v.2.1.0.dev20230803+rocm5.6

AI/ML Capabilities:
---------------------
Device type: cuda
CUDA: Ready
TensorRT: Unavailable
DirectML: Unavailable
MPS: Unavailable

You can now execute enfugue run, and the server should start - now things may work for you. This is where I started getting segmentation faults on the V520 and stalled. I suspect you won't receive segmentation faults, but there may be other errors.

I'm sorry this is such a pain, the infrastructure around AMD AI/ML is fairly immature compared to Nvidia's, and it's difficult to get a hold of hardware for me to test on. I'm very grateful for any effort you put into this, I'd love to get this working at least somewhat consistently for AMD users.

painebenjamin avatar Aug 07 '23 21:08 painebenjamin