automatic1111-webui-nix
automatic1111-webui-nix copied to clipboard
Segfault with AMD / ROCm on NixOS stable 23.05 in stable-diffusion-webui, probably due to conflicting Mesa build
After cloning https://github.com/AUTOMATIC1111/stable-diffusion-webui I tried to run it with ROCm using nix develop .#rocm. However the webui crashes as soon as it gets up and running with segmentation fault.
In the shell I get an error DRI driver not from this Mesa build ('23.0.3' vs '23.1.9') which would indicate some incompatible build of Mesa being present. My main system is running the stable branch of NixOS, currently 23.05. So that is where Mesa build 23.0.3 would come from. Any ideas why 23.1.9 is there? Presumably because the flake pulls in the latest nixpkgs?
Any ideas how to get around this? Help is appreciated. :smile:
Full output here:
################################################################
Launching launch.py...
################################################################
ldconfig: Can't open cache file /nix/store/gqghjch4p1s69sv4mcjksb2kb65rwqjy-glibc-2.38-23/etc/ld.so.cache
: No such file or directory
Cannot locate TCMalloc (improves CPU memory usage)
Python 3.10.13 (main, Aug 24 2023, 12:59:26) [GCC 12.3.0]
Version: v1.6.0
Commit hash: 5ef669de080814067961f28357256e8fe27544f4
Launching Web UI with arguments:
no module 'xformers'. Processing without...
no module 'xformers'. Processing without...
No module 'xformers'. Proceeding without it.
Calculating sha256 for /home/ap/Coding/python/stable-diffusion-webui/models/Stable-diffusion/v1-5-pruned-emaonly.safetensors: Running on local URL: http://127.0.0.1:7860
To create a public link, set `share=True` in `launch()`.
Startup time: 6.7s (prepare environment: 2.1s, import torch: 2.1s, import gradio: 0.6s, setup paths: 0.5s, other imports: 0.4s, load scripts: 0.4s, create ui: 0.4s, gradio launch: 0.1s).
Opening in existing browser session.
DRI driver not from this Mesa build ('23.0.3' vs '23.1.9')
failed to bind extensions
DRI driver not from this Mesa build ('23.0.3' vs '23.1.9')
failed to bind extensions
DRI driver not from this Mesa build ('23.0.3' vs '23.1.9')
failed to bind extensions
DRI driver not from this Mesa build ('23.0.3' vs '23.1.9')
failed to bind extensions
6ce0161689b3853acaa03779ec93eafe75a02f4ced659bee03f50797806fa2fa
Loading weights [6ce0161689] from /home/ap/Coding/python/stable-diffusion-webui/models/Stable-diffusion/v1-5-pruned-emaonly.safetensors
Creating model from config: /home/ap/Coding/python/stable-diffusion-webui/configs/v1-inference.yaml
./webui.sh: line 255: 144988 Segmentation fault (core dumped) "${python_cmd}" -u "${LAUNCH_SCRIPT}" "$@"
Same here, also struggling to get it to work in Distrobox because Distrobox seems to be broken due to it not being HFS despite the fact it's in the NixOS repos
sadly, same here. is there any fix for this please?
@rastarr my fix is called OCI containers.
I got this to work fairly reliably: https://github.com/ai-dock/stable-diffusion-webui
You just have to disable sd_dreambooth_extension and bitsandbytes in the provisioning script and pass the appropriate ENV variable via docker-compose to tell ROCm which GPU arch you are running, and it worked for me.
I did some digging and found this -
Using rocm version 5.5.0 fixed segfault for me (RX 580):
Re initalize your venv
Enter this: TORCH_COMMAND='pip install torch torchvision --extra-index-url https://download.pytorch.org/whl/rocm5.5.0'
Run python3 launch.py --precision full --no-half --opt-sub-quad-attention --lowvram --disable-nan-check --skip-torch-cuda-test
In this case webui.sh does not need to be touched
How do we go about changing the flake to use rocm5.5.0 ? Any help would be greatly appreciated
To be fair, I'd rather not use an old version of ROCm in the flake itself. Generally something newer (5.7. is the most recent I am running right now, but will test 6.0 soon) works fine for me so that should work fine here as well. However if you want to do the change locally, I'd suggest you find out which checkout of nixpkgs gives you ROCm 5.5 and use that. A big issue is that depending on how you do it, you might have to recompile everything. Which won't be fun.
well, i actually changed the runtime in the flake to rocmPackages_5.rocm-runtime to lock it to 5.7 then entered the shell and did ' export TORCH_COMMAND='pip install torch torchvision --extra-index-url https://download.pytorch.org/whl/rocm5.5.0' '
now running my 8G RX580 with ' ./webui.sh --precision full --no-half --opt-sub-quad-attention --lowvram --disable-nan-check --skip-torch-cuda-test --share ' and all is working So that's great