stable-diffusion-webui icon indicating copy to clipboard operation
stable-diffusion-webui copied to clipboard

[Bug]: Segmentation fault (core dumped) with docker pytorch and gfx803

Open picarica opened this issue 1 year ago • 2 comments

Checklist

  • [X] The issue exists after disabling all extensions
  • [X] The issue exists on a clean installation of webui
  • [X] The issue is caused by an extension, but I believe it is caused by a bug in the webui
  • [X] The issue exists in the current version of the webui
  • [X] The issue has not been reported before recently
  • [X] The issue has been reported before but has not been fixed yet

What happened?

i heard gfx803 should work with docker amd images i followed these instructions https://github.com/AUTOMATIC1111/stable-diffusion-webui/wiki/Install-and-Run-on-AMD-GPUs#install-on-amd-and-arch-linux and i fixed all the errors along the installation but i got unknown segmentation core error

Steps to reproduce the problem

  1. https://github.com/AUTOMATIC1111/stable-diffusion-webui/wiki/Install-and-Run-on-AMD-GPUs#install-on-amd-and-arch-linux follow those instructions different only is i tried launching with .webui.sh but added those commandline argumentus into webui-user.sh
  2. export HSA_OVERRIDE_GFX_VERSION=10.3.0
  3. had to fix this error and use this fix https://github.com/AUTOMATIC1111/stable-diffusion-webui/issues/11458#issuecomment-1611586137
  4. segmentation faul

What should have happened?

it should have launched i had everything all drivers working, when i type rocminfo i get gfx803 output inside and outside the docker image rocminfo | grep gfx Name: gfx803 Name: amdgcn-amd-amdhsa--gfx803

What browsers do you use to access the UI ?

No response

Sysinfo

host is Gentoo and using ryzen 5600X with RX570 GPU and 16GB of ram

Console logs

had this 
RuntimeError: HIP error: invalid device function
HIP kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing HIP_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_HIP_DSA` to enable device-side assertions.



Stable diffusion model failed to load
Applying attention optimization: Doggettx... done.

rocBLAS error: Cannot read /opt/rocm/lib/rocblas/library/TensileLibrary.dat: No such file or directory for GPU arch : gfx803
 List of available TensileLibrary Files : 
"/opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx941.dat"
"/opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx90a.dat"
"/opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx1030.dat"
"/opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx1102.dat"
"/opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx940.dat"
"/opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx906.dat"
"/opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx1101.dat"
"/opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx1100.dat"
"/opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx942.dat"
"/opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx908.dat"
"/opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx900.dat"
Aborted (core dumped)
but i fixed it with export HSA_OVERRIDE_GFX_VERSION=10.3.0

after that 

REQS_FILE='requirements.txt' python launch.py --precision full --no-half --skip-torch-cuda-test
Python 3.9.18 (main, Sep 11 2023, 13:41:44) 
[GCC 11.2.0]
Version: v1.7.0
Commit hash: cf2772fab0af5573da775e7437e6acdca424f26e
Launching Web UI with arguments: --precision full --no-half --skip-torch-cuda-test
Segmentation fault (core dumped)
root@gentoo:/dockerx/stable-diffusion-webui# REQS_FILE='requirements.txt' python launch.py --precision full --no-half --skip-torch-cuda-test
Python 3.9.18 (main, Sep 11 2023, 13:41:44) 
[GCC 11.2.0]
Version: v1.7.0
Commit hash: cf2772fab0af5573da775e7437e6acdca424f26e
Launching Web UI with arguments: --precision full --no-half --skip-torch-cuda-test
Segmentation fault (core dumped)

Additional information

No response

picarica avatar Feb 07 '24 10:02 picarica

Ive got the same issue in a different context

victorsl3 avatar Feb 12 '24 10:02 victorsl3

HSA_OVERRIDE_GFX_VERSION=10.3.0 works well for Navi (rx 5000 and 6000 series), but your gpu is older. Try with HSA_OVERRIDE_GFX_VERSION=9.0.0

DGdev91 avatar Feb 15 '24 23:02 DGdev91