app.enfugue.ai
app.enfugue.ai copied to clipboard
Error in image generation: PytorchStreamReader failed reading zip archive: failed finding central directory
Issue
I get the following error when trying to generate picture: RuntimeError: PytorchStreamReader failed reading zip archive: failed finding central directory
General Information
enfugue-server-0.2.0-manylinux-cuda-x86_64.tar.gz.* Archlinux - linux 6.4.3-zen1-2-zen AMD RX 5700
Log
tail -f ~/.cache/enfugue.log
2023-08-07 19:38:26,515 [cherrypy.error] INFO (_cplogging.py:213) [07/Aug/2023:19:38:26] ENGINE Bus STARTING
2023-08-07 19:38:26,622 [cherrypy.error] INFO (_cplogging.py:213) [07/Aug/2023:19:38:26] ENGINE Serving on https://0.0.0.0:45554
2023-08-07 19:38:26,622 [cherrypy.error] INFO (_cplogging.py:213) [07/Aug/2023:19:38:26] ENGINE Bus STARTED
2023-08-07 19:38:28,690 [enfugue] ERROR (gpu.py:145) Couldn't execute nvidia-smi (binary `nvidia-smi`): [Errno 2] No such file or directory: 'nvidia-smi'
2023-08-07 19:38:28,696 [pibble] ERROR (__init__.py:232) Error handler raised exception DetachedInstanceError(Instance <AuthenticationTokenDeclarative at 0x7f20137d6d10> is not bound to a Session; attribute refresh operation cannot proceed (Background on this error at: https://sqlalche.me/e/14/bhk3))
2023-08-07 19:38:38,589 [enfugue] ERROR (gpu.py:145) Couldn't execute nvidia-smi (binary `nvidia-smi`): [Errno 2] No such file or directory: 'nvidia-smi'
2023-08-07 19:38:48,592 [enfugue] ERROR (gpu.py:145) Couldn't execute nvidia-smi (binary `nvidia-smi`): [Errno 2] No such file or directory: 'nvidia-smi'
2023-08-07 19:38:59,184 [enfugue] ERROR (gpu.py:145) Couldn't execute nvidia-smi (binary `nvidia-smi`): [Errno 2] No such file or directory: 'nvidia-smi'
2023-08-07 19:39:09,137 [enfugue] ERROR (gpu.py:145) Couldn't execute nvidia-smi (binary `nvidia-smi`): [Errno 2] No such file or directory: 'nvidia-smi'
2023-08-07 19:39:19,195 [enfugue] ERROR (gpu.py:145) Couldn't execute nvidia-smi (binary `nvidia-smi`): [Errno 2] No such file or directory: 'nvidia-smi'
2023-08-07 19:39:28,479 [enfugue] ERROR (gpu.py:145) Couldn't execute nvidia-smi (binary `nvidia-smi`): [Errno 2] No such file or directory: 'nvidia-smi'
2023-08-07 19:39:38,526 [enfugue] ERROR (gpu.py:145) Couldn't execute nvidia-smi (binary `nvidia-smi`): [Errno 2] No such file or directory: 'nvidia-smi'
2023-08-07 19:39:49,552 [enfugue] ERROR (gpu.py:145) Couldn't execute nvidia-smi (binary `nvidia-smi`): [Errno 2] No such file or directory: 'nvidia-smi'
2023-08-07 19:39:52,055 [enfugue] ERROR (engine.py:259) Traceback (most recent call last):
File "enfugue/diffusion/process.py", line 360, in run
File "enfugue/diffusion/process.py", line 112, in execute_diffusion_plan
File "enfugue/diffusion/plan.py", line 698, in execute
File "enfugue/diffusion/plan.py", line 911, in execute_nodes
File "enfugue/diffusion/plan.py", line 542, in execute
File "enfugue/diffusion/plan.py", line 443, in execute
File "enfugue/diffusion/manager.py", line 2819, in __call__
File "enfugue/diffusion/manager.py", line 2181, in pipeline
File "enfugue/diffusion/pipeline.py", line 204, in from_ckpt
File "torch/serialization.py", line 995, in load
with _open_zipfile_reader(opened_file) as opened_zipfile:
File "torch/serialization.py", line 449, in __init__
super().__init__(torch._C.PyTorchFileReader(name_or_buffer))
RuntimeError: PytorchStreamReader failed reading zip archive: failed finding central directory
2023-08-07 19:39:59,601 [enfugue] ERROR (gpu.py:145) Couldn't execute nvidia-smi (binary `nvidia-smi`): [Errno 2] No such file or directory: 'nvidia-smi'
Hi @lennartbrandin, so glad you're back!
First, this specific error message looks like the .ckpt
file it's trying to download may be broken. If you're using the default model, check your downloaded version at ~/.cache/enfugue/checkpoint/v1-5-pruned.ckpt
and make sure it's md5sum
is fde08ee6f4fac7ab26592bf519cbb405
. If not, delete it, and Enfugue will download it again.
Next, I can tell by your logs already that it's unlikely Enfugue is talking to your GPU. I did some work on AMD GPU compatibility, but I only have V520's available to me, and those are technically unsupported - so it's been difficult to nail down consistent generic steps for AMD users.
The fact that it's trying to use nvidia-smi
tells me that it can't find rocm-smi
, which is AMD's equivalent. Have you installed any ROCm drivers? I didn't mention this anywhere in my install instructions - if you've managed to get Stable Diffusion to work with some other UI, then you probably already have done this, but if not, it's an essential part of getting AMD devices to work AI/ML in general. Here is some documentation on that, the TL;DR is you need to run amdgpu-install --usecase=rocm
. You'll also need to make sure whatever user you're running Enfugue as is a member of the video
and render
groups, and restart your machine after modifying group memberships (for some reason it doesn't seem to update otherwise.)
I'm sorry to say I have no idea what version(s) of ROCm will work with your hardware, there are many variables in play with architecture and LLVM targets that is totally unfamiliar to me, and it's entirely possible that no combination will work - at least with the build you have.
If you're willing to go the extra mile, your best bet for compatibility is going to be starting from source, and working with versions of Torch+ROCm until something sticks - this is how I was trying to get the V520 to work:
- Install Miniconda (Arch Link)
- Download this environment file anywhere on your machine.
- Run
conda env create -f <file_downloaded_above>
. This will take 5-15 minutes to install about ~3G of dependencies. - Run
conda activate enfugue
to set up your shell to use the conda-installed binaries and libraries. - Run
pip install enfugue==0.2.0 --no-deps
to get Enfugue and not try to install any other packages.
You should now have an Enfugue installation with no API. If you simply run enfugue version
this should be the output:
$ enfugue version
Enfugue v.0.2.0
Torch v.2.1.0.dev20230803+cpu
AI/ML Capabilities:
---------------------
Device type: cpu
CUDA: Unavailable
DirectML: Unavailable
MPS: Unavailable
Now what we want to do is install the version of Torch that matches the version of ROCm
installed earlier. You can get the version like this:
$ apt show rocm-libs -a
Package: rocm-libs
Version: 5.6.0.50600-67~22.04
...
Install the torch version that matches this like so. Ignore any errors.
$ pip install torch==2.1.0.dev20230803+rocm5.6 --extra-index-url https://download.pytorch.org/whl/nightly
Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com, https://download.pytorch.org/whl/nightly
Collecting torch==2.1.0.dev20230803+rocm5.6
Downloading https://download.pytorch.org/whl/nightly/rocm5.6/torch-2.1.0.dev20230803%2Brocm5.6-cp310-cp310-linux_x86_64.whl (1590.7 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.6/1.6 GB 285.5 MB/s eta 0:00:00
Requirement already satisfied: filelock in /home/ubuntu/miniconda3/envs/enfugue-test/lib/python3.10/site-packages (from torch==2.1.0.dev20230803+rocm5.6) (3.12.2)
Requirement already satisfied: typing-extensions in /home/ubuntu/miniconda3/envs/enfugue-test/lib/python3.10/site-packages (from torch==2.1.0.dev20230803+rocm5.6) (4.7.1)
Requirement already satisfied: sympy in /home/ubuntu/miniconda3/envs/enfugue-test/lib/python3.10/site-packages (from torch==2.1.0.dev20230803+rocm5.6) (1.12)
Requirement already satisfied: networkx in /home/ubuntu/miniconda3/envs/enfugue-test/lib/python3.10/site-packages (from torch==2.1.0.dev20230803+rocm5.6) (3.1)
Requirement already satisfied: jinja2 in /home/ubuntu/miniconda3/envs/enfugue-test/lib/python3.10/site-packages (from torch==2.1.0.dev20230803+rocm5.6) (3.1.2)
Requirement already satisfied: fsspec in /home/ubuntu/miniconda3/envs/enfugue-test/lib/python3.10/site-packages (from torch==2.1.0.dev20230803+rocm5.6) (2023.6.0)
Requirement already satisfied: MarkupSafe>=2.0 in /home/ubuntu/miniconda3/envs/enfugue-test/lib/python3.10/site-packages (from jinja2->torch==2.1.0.dev20230803+rocm5.6) (2.1.3)
Requirement already satisfied: mpmath>=0.19 in /home/ubuntu/miniconda3/envs/enfugue-test/lib/python3.10/site-packages (from sympy->torch==2.1.0.dev20230803+rocm5.6) (1.3.0)
Installing collected packages: torch
Attempting uninstall: torch
Found existing installation: torch 2.1.0.dev20230803+cpu
Uninstalling torch-2.1.0.dev20230803+cpu:
Successfully uninstalled torch-2.1.0.dev20230803+cpu
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
enfugue 0.2.0 requires diffusers<0.19,>=0.18, but you have diffusers 0.20.0.dev0 which is incompatible.
enfugue 0.2.0 requires einops<0.4,>=0.3, but you have einops 0.6.1 which is incompatible.
enfugue 0.2.0 requires invisible-watermark<0.2,>=0.1, but you have invisible-watermark 0.2.0 which is incompatible.
You can find all the versions available by going to https://download.pytorch.org/whl/nightly/torch/ in a browser. After you've installed this, and everything has been set up correctly, the result of enfugue version
should change, indicating that CUDA is now available.
$ enfugue version
Failed to parse CPUID
Enfugue v.0.2.0
Torch v.2.1.0.dev20230803+rocm5.6
AI/ML Capabilities:
---------------------
Device type: cuda
CUDA: Ready
TensorRT: Unavailable
DirectML: Unavailable
MPS: Unavailable
You can now execute enfugue run
, and the server should start - now things may work for you. This is where I started getting segmentation faults on the V520 and stalled. I suspect you won't receive segmentation faults, but there may be other errors.
I'm sorry this is such a pain, the infrastructure around AMD AI/ML is fairly immature compared to Nvidia's, and it's difficult to get a hold of hardware for me to test on. I'm very grateful for any effort you put into this, I'd love to get this working at least somewhat consistently for AMD users.