automatic1111-webui-nix icon indicating copy to clipboard operation
automatic1111-webui-nix copied to clipboard

Segfault with AMD / ROCm on NixOS stable 23.05 in stable-diffusion-webui, probably due to conflicting Mesa build

Open ic4-y opened this issue 2 years ago • 6 comments

After cloning https://github.com/AUTOMATIC1111/stable-diffusion-webui I tried to run it with ROCm using nix develop .#rocm. However the webui crashes as soon as it gets up and running with segmentation fault.

In the shell I get an error DRI driver not from this Mesa build ('23.0.3' vs '23.1.9') which would indicate some incompatible build of Mesa being present. My main system is running the stable branch of NixOS, currently 23.05. So that is where Mesa build 23.0.3 would come from. Any ideas why 23.1.9 is there? Presumably because the flake pulls in the latest nixpkgs?

Any ideas how to get around this? Help is appreciated. :smile:

Full output here:

################################################################
Launching launch.py...
################################################################
ldconfig: Can't open cache file /nix/store/gqghjch4p1s69sv4mcjksb2kb65rwqjy-glibc-2.38-23/etc/ld.so.cache
: No such file or directory
Cannot locate TCMalloc (improves CPU memory usage)
Python 3.10.13 (main, Aug 24 2023, 12:59:26) [GCC 12.3.0]
Version: v1.6.0
Commit hash: 5ef669de080814067961f28357256e8fe27544f4
Launching Web UI with arguments: 
no module 'xformers'. Processing without...
no module 'xformers'. Processing without...
No module 'xformers'. Proceeding without it.
Calculating sha256 for /home/ap/Coding/python/stable-diffusion-webui/models/Stable-diffusion/v1-5-pruned-emaonly.safetensors: Running on local URL:  http://127.0.0.1:7860

To create a public link, set `share=True` in `launch()`.
Startup time: 6.7s (prepare environment: 2.1s, import torch: 2.1s, import gradio: 0.6s, setup paths: 0.5s, other imports: 0.4s, load scripts: 0.4s, create ui: 0.4s, gradio launch: 0.1s).
Opening in existing browser session.
DRI driver not from this Mesa build ('23.0.3' vs '23.1.9')
failed to bind extensions
DRI driver not from this Mesa build ('23.0.3' vs '23.1.9')
failed to bind extensions
DRI driver not from this Mesa build ('23.0.3' vs '23.1.9')
failed to bind extensions
DRI driver not from this Mesa build ('23.0.3' vs '23.1.9')
failed to bind extensions
6ce0161689b3853acaa03779ec93eafe75a02f4ced659bee03f50797806fa2fa
Loading weights [6ce0161689] from /home/ap/Coding/python/stable-diffusion-webui/models/Stable-diffusion/v1-5-pruned-emaonly.safetensors
Creating model from config: /home/ap/Coding/python/stable-diffusion-webui/configs/v1-inference.yaml
./webui.sh: line 255: 144988 Segmentation fault      (core dumped) "${python_cmd}" -u "${LAUNCH_SCRIPT}" "$@"

ic4-y avatar Oct 31 '23 19:10 ic4-y

Same here, also struggling to get it to work in Distrobox because Distrobox seems to be broken due to it not being HFS despite the fact it's in the NixOS repos

nonetrix avatar Jan 18 '24 06:01 nonetrix

sadly, same here. is there any fix for this please?

rastarr avatar Apr 02 '24 10:04 rastarr

@rastarr my fix is called OCI containers.

I got this to work fairly reliably: https://github.com/ai-dock/stable-diffusion-webui

You just have to disable sd_dreambooth_extension and bitsandbytes in the provisioning script and pass the appropriate ENV variable via docker-compose to tell ROCm which GPU arch you are running, and it worked for me.

ic4-y avatar Apr 02 '24 11:04 ic4-y

I did some digging and found this -

Using rocm version 5.5.0 fixed segfault for me (RX 580):

Re initalize your venv
Enter this: TORCH_COMMAND='pip install torch torchvision --extra-index-url https://download.pytorch.org/whl/rocm5.5.0'
Run python3 launch.py --precision full --no-half --opt-sub-quad-attention --lowvram --disable-nan-check --skip-torch-cuda-test
In this case webui.sh does not need to be touched

How do we go about changing the flake to use rocm5.5.0 ? Any help would be greatly appreciated

rastarr avatar Apr 07 '24 00:04 rastarr

To be fair, I'd rather not use an old version of ROCm in the flake itself. Generally something newer (5.7. is the most recent I am running right now, but will test 6.0 soon) works fine for me so that should work fine here as well. However if you want to do the change locally, I'd suggest you find out which checkout of nixpkgs gives you ROCm 5.5 and use that. A big issue is that depending on how you do it, you might have to recompile everything. Which won't be fun.

ic4-y avatar Apr 07 '24 11:04 ic4-y

well, i actually changed the runtime in the flake to rocmPackages_5.rocm-runtime to lock it to 5.7 then entered the shell and did ' export TORCH_COMMAND='pip install torch torchvision --extra-index-url https://download.pytorch.org/whl/rocm5.5.0' '

now running my 8G RX580 with ' ./webui.sh --precision full --no-half --opt-sub-quad-attention --lowvram --disable-nan-check --skip-torch-cuda-test --share ' and all is working So that's great

rastarr avatar Apr 08 '24 00:04 rastarr