stable-diffusion-webui
stable-diffusion-webui copied to clipboard
[Bug]: Default rocm 5.4.2 is too outdated (Broken on Navi 2x)
Checklist
- [x] The issue exists after disabling all extensions
- [x] The issue exists on a clean installation of webui
bef51aed * master origin/master v1.8.0 Merge branch - [ ] The issue is caused by an extension, but I believe it is caused by a bug in the webui
- [X] The issue exists in the current version of the webui
- [ ] The issue has not been reported before recently
- [ ] The issue has been reported before but has not been fixed yet
What happened?
Hi! On a NixOS system with s-d-nix, and a recent-ish rocm-runtime-5.7.2, the default torch==2.0.1+rocm5.4.2 for AMD Navi 2nd-gens causes a segfault in rocr::AMD::hsa_amd_memory_lock_to_pool
This is fixed when I borrow the command for Navi 3: export TORCH_COMMAND="pip install --pre torch torchvision --index-url https://download.pytorch.org/whl/nightly/rocm5.7" and stuff it in my webui-user.sh.
(and also run it wrapped steam-run for NixOS compat, but that's irrelevant for this bug)
Trying to detect via GPU generation is wrong. The script should probe the system ROCm version and pick the appropriate torch package that way.
Console logs
#0 0x00007fff610f67c9 in rocr::AMD::hsa_amd_memory_lock_to_pool(void*, unsigned long, hsa_agent_s*, int, hsa_amd_memory_pool_s, unsigned int, void**) ()
from /nix/store/pk2mnvmz69c5f3m8615yqbl44p2lksrl-rocm-runtime-5.7.1/lib/libhsa-runtime64.so
On OpenSUSE Tumbleweed, the install script gave me torch-2.3.0+rocm5.7 for my Navi21 card. I did have to export HCC_AMDGPU_TARGET=gfx1030. I don't have a system ROCm install, and don't really want one.
New version of ROCm Pytorch is out, and the current install script is broken becuase the download link is gone.
Fix. Open webui.sh and replace all rocm5.7 with rocm6.2
Also, I am finding that HSA_OVERRIDE_GFX_VERSION=11.0.0 is required to launch on a 7800xt