stable-diffusion-webui icon indicating copy to clipboard operation
stable-diffusion-webui copied to clipboard

Update to ROCm5.7 and PyTorch

Open alexhegit opened this issue 5 months ago • 30 comments

The webui.sh installs ROCm5.4.2 as default. The webui run failed with AMD Radeon Pro W7900 with Segmentation Fault at Ubuntu22.04 maybe the ABI compatibility issue.

ROCm5.7 is the latest version supported by PyTorch (https://pytorch.org/) at now. I test it with AMD Radeon Pro W7900 by PyTorch+ROCm5.7 with PASS.

Description

  • a simple description of what you're trying to accomplish
  • a summary of changes in code
  • which issues it fixes, if any

Screenshots/videos:

image

image

Checklist:

alexhegit avatar Feb 02 '24 06:02 alexhegit

Needs few comments from other users.

AUTOMATIC1111 avatar Feb 02 '24 09:02 AUTOMATIC1111

Been using 5.7 for weeks without any issues on AMD RX 7900 XT

Mantas-2155X avatar Feb 02 '24 16:02 Mantas-2155X

I have a 6700 XT and updating to pytorch 2.1 + ROCm 5.7 (I think I tried 5.6 as well) causes my generations to perform slower and sometimes just lock up. I've just not had alot of success with anything beyond 2.0.1+ROCm 5.4.2, they work but just perform worse for me and my card. I recently rebuilt my machine from the ground up, tested it again and got fed up with it and downgraded. EDIT: Actually I had tested it on 2.1+ROCm 5.6, I didn't notice pytorch 2.2 was the latest so ill test when I get a chance to see if those performance issues were resolved. EDIT2: Tried it, not good. I can generate normal images with no issues. However once I use a larger size or hires.fix, it stutters like mad, takes forever to hire.res, and then my machine freezes until it says HIP out of memory and fails. I have no such issue at all with pytorch 2.0.1+ROCm 5.4.2, I even used the same exact generation and it performs fine.

Soulreaver90 avatar Feb 03 '24 23:02 Soulreaver90

I have a 6700 XT and updating to pytorch 2.1 + ROCm 5.7 (I think I tried 5.6 as well) causes my generations to perform slower and sometimes just lock up. I've just not had alot of success with anything beyond 2.0.1+ROCm 5.4.2, they work but just perform worse for me and my card. I recently rebuilt my machine from the ground up, tested it again and got fed up with it and downgraded. EDIT: Actually I had tested it on 2.1+ROCm 5.6, I didn't notice pytorch 2.2 was the latest so ill test when I get a chance to see if those performance issues were resolved. EDIT2: Tried it, not good. I can generate normal images with no issues. However once I use a larger size or hires.fix, it stutters like mad, takes forever to hire.res, and then my machine freezes until it says HIP out of memory and fails. I have no such issue at all with pytorch 2.0.1+ROCm 5.4.2, I even used the same exact generation and it performs fine.

The PyTorch2.2.0+ROCm5.7 should be the official pair. Wishing you try it with 6700XT fine with good performance. BTW: what's the it/s performance(512x512, 100 steps) w/ 2.0.1+ROCm 5.4.2+6700XT?

image

alexhegit avatar Feb 04 '24 03:02 alexhegit

If you read my second edit, I tried 2.2+5.7, and it doesnt work well for me. Normal generation is fine but takes a bit longer to start. Hire res or any larger resolution is unusable! It takes forever to upscale, freezes my computer and then runs out of memory. I do not have this problem with 2.0.1+5.4.2. My avg it/s at 512 is ~6.6 it/s

Soulreaver90 avatar Feb 04 '24 12:02 Soulreaver90

I've been on PyTorch Preview with ROCm 5.7 for ~a month now, seems to work fine. Around 4it/s IIRC for 1024x1024 SDXL on my 6950XT.

Edit: I can attest to the hires issues, what has worked for me is instead of using the "builtin" models to use 4x_realesrgan that I manually downloaded. It still takes a bit to start (longer than the SD pipeline but not unusably long) but runs fine.

L3tum avatar Feb 05 '24 08:02 L3tum

I've been using 5.7 for a while and currently torch 2.2 + rocm 5.7 and it seems to work fine for me.

7900 XTX. Gets about 18 it/s

freescape avatar Feb 17 '24 20:02 freescape

Do we need to install different versions for different videocards?

AUTOMATIC1111 avatar Feb 18 '24 04:02 AUTOMATIC1111

I've been using 5.7 for a while and currently torch 2.2 + rocm 5.7 and it seems to work fine for me.

7900 XTX. Gets about 18 it/s

almost same to me.

alexhegit avatar Feb 19 '24 05:02 alexhegit

I've been using 5.7 for a while and currently torch 2.2 + rocm 5.7 and it seems to work fine for me. 7900 XTX. Gets about 18 it/s

almost same to me.

The default version ROCm5.4 got Segmentation Fault with Radeon W7900 ( maybe all Nav31). and this version is too old for long term usage.

alexhegit avatar Feb 19 '24 05:02 alexhegit

Most of this special case code for installing Pytorch on ROCm is a very hacky and fragile workaround for people with specific issues. And then you get stuff like https://github.com/AUTOMATIC1111/stable-diffusion-webui/pull/14293 which should never have been merged into dev branch (it currently installs whatever the latest torch-2.30dev build is).

If PyTorch 2.1.2 is what is supported (as per the 1.8.0-RC release notes) then just install that and anyone who requires different can supply their own TORCH_COMMAND.

pip install torch==2.1.2 torchvision==0.16.2 --index-url https://download.pytorch.org/whl/rocm5.6

https://pytorch.org/get-started/previous-versions/

Personally I have no problem with the current 2.2.0 stable release used in this pull request but that doesn't match "Update torch to version 2.1.2" from the 1.8.0-RC release notes.

EDIT

Also note that Navi1 (RX5000 series) cards don't work with PyTorch 2.x. Installing torch==1.13.1+rocm5.2 on dev branch still works to get a functional webui that can do basic rendering on a RX5500XT 8GB but I haven't tested past that and very obviously this is not sustainable going forward. Navi1 support will have to be dropped unless the PyTorch 2.x issue can be solved.

https://github.com/AUTOMATIC1111/stable-diffusion-webui/pull/11048

MrLavender avatar Feb 19 '24 14:02 MrLavender

i am using linux mint with 6750xt. pytorch always defaults to rocm5.4.2. is this way good for detecting amd gpus?

# Check if lspci command is available
if ! command -v lspci &> /dev/null; then
    echo "lspci command not found. Please make sure it is installed."
    exit 1
fi

# Use lspci to list PCI devices and grep for VGA compatible controller
gpu_brand=$(lspci | grep "VGA compatible controller")
# Check the GPU company
if [[ $gpu_brand == *AMD* ]]; then
    echo "AMD GPU detected."
    
    # Check if rocminfo is installed
    if ! command -v rocminfo &> /dev/null; then
        echo "Error: rocminfo is not installed. Please install ROCm and try again."
        exit 1
    fi

    # Get GPU information using rocminfo
    rocm_info=$(rocminfo)

    # Extract GPU identifier (gfx part) from rocminfo output
    gpu_info=$(echo "$rocm_info" | awk '/^Agent 2/,/^$/ {if ($1 == "Name:" && $2 ~ /^gfx/) {gsub("AMD", "", $2); print $2; exit}}')

    # Define officially supported GPU versions
    supported_versions="gfx900 gfx906 gfx908 gfx90a gfx942 gfx1030 gfx1100"
    # Check if the extracted gfx_version is in the list of supported versions
    if echo "$supported_versions" | grep -qw "$gpu_info"; then
        echo "AMD $gpu_info is officially supported by ROCm."
        export TORCH_COMMAND="pip install torch torchvision --index-url https://download.pytorch.org/whl/rocm5.7"
    else
        if [[ $gpu_info == gfx9* ]]; then
            export HSA_OVERRIDE_GFX_VERSION=9.0.0
            export TORCH_COMMAND="pip install torch==1.13.1+rocm5.2 torchvision==0.14.1+rocm5.2 --index-url https://download.pytorch.org/whl/rocm5.2"
            printf "\n%s\n" "${delimiter}"
            printf "Experimental support gfx9 series: make sure to have at least 4GB of VRAM and 10GB of RAM or enable cpu mode: --use-cpu all --no-half"
            printf "\n%s\n" "${delimiter}"
        elif [[ $gpu_info == gfx10* ]]; then
            export HSA_OVERRIDE_GFX_VERSION=10.3.0
            export TORCH_COMMAND="pip install torch torchvision --index-url https://download.pytorch.org/whl/rocm5.7"
        elif [[ $gpu_info == gfx11* ]]; then
            export HSA_OVERRIDE_GFX_VERSION=11.0.0
            export TORCH_COMMAND="pip install --pre torch torchvision --index-url https://download.pytorch.org/whl/nightly/rocm6.0"
        fi
    fi
    if echo "$gpu_info" | grep -q "Huawei"; then
        export TORCH_COMMAND="pip install torch==2.1.0 torchvision --index-url https://download.pytorch.org/whl/cpu; pip install torch_npu"
    fi

elif [[ $gpu_brand == *NVIDIA* ]]; then
    echo "NVIDIA GPU detected."
else
    echo "Unable to identify GPU manufacturer."
    exit 1
fi

chiragkrishna avatar Feb 20 '24 02:02 chiragkrishna

ps://download.pytorch.org/whl/rocm5.

It's better solution.

alexhegit avatar Feb 20 '24 14:02 alexhegit

this way the rocm version can be chosen by the user

# Check if lspci command is available
if ! command -v lspci &>/dev/null; then
    echo "lspci command not found. Please make sure it is installed."
    exit 1
fi

# Use lspci to list PCI devices and grep for VGA compatible controller
gpu_brand=$(lspci | grep "VGA compatible controller")
# Check the GPU company
if [[ $gpu_brand == *AMD* ]]; then
    echo "AMD GPU detected."

    # Check if rocminfo is installed
    if ! command -v rocminfo &>/dev/null; then
        echo "Error: rocminfo is not installed. Please install ROCm and try again."
        exit 1
    fi

    # Get GPU information using rocminfo
    rocm_info=$(rocminfo)

    # Extract GPU identifier (gfx part) from rocminfo output
    gpu_info=$(echo "$rocm_info" | awk '/^Agent 2/,/^$/ {if ($1 == "Name:" && $2 ~ /^gfx/) {gsub("AMD", "", $2); print $2; exit}}')
    # Define officially supported GPU versions
    supported_versions="gfx900 gfx906 gfx908 gfx90a gfx942 gfx1030 gfx1100"
    # Check if the extracted gfx_version is in the list of supported versions
    if echo "$supported_versions" | grep -qw "$gpu_info"; then
        echo "AMD $gpu_info is officially supported by ROCm."
    else
        echo "AMD $gpu_info is not officially supported by ROCm."
        if [[ $gpu_info == gfx9* ]]; then
            export HSA_OVERRIDE_GFX_VERSION=9.0.0
            printf "\n%s\n" "${delimiter}"
            printf "Experimental support gfx9 series: make sure to have at least 4GB of VRAM and 10GB of RAM or enable cpu mode: --use-cpu all --no-half"
            printf "\n%s\n" "${delimiter}"
        elif [[ $gpu_info == gfx10* ]]; then
            export HSA_OVERRIDE_GFX_VERSION=10.3.0
        elif [[ $gpu_info == gfx11* ]]; then
            export HSA_OVERRIDE_GFX_VERSION=11.0.0
        fi
        echo "Changed HSA_OVERRIDE_GFX_VERSION to $HSA_OVERRIDE_GFX_VERSION"
    fi
    # Function to display menu
    display_menu() {
        echo "Choose your ROCM version:"
        echo "1. torch==1.13.1+rocm5.2 torchvision==0.14.1+rocm5.2"
        echo "2. torch==2.0.1+rocm5.4.2 torchvision==0.15.2+rocm5.4.2"
        echo "3. ROCM-5.6"
        echo "4. ROCM-5.7"
        echo "5. ROCM 6 (Preview)"
        echo "6. CPU-Only"
    }

    # Function to handle user input
    handle_input() {
        read -p "Enter your choice (1-5): " choice
        case $choice in
        1)
            echo "You selected Option 1"
            export TORCH_COMMAND="pip install torch==1.13.1+rocm5.2 torchvision==0.14.1+rocm5.2 --index-url https://download.pytorch.org/whl/rocm5.2"
            ;;
        2)
            echo "You selected Option 2"
            export TORCH_COMMAND="pip install torch==2.0.1+rocm5.4.2 torchvision==0.15.2+rocm5.4.2 --index-url https://download.pytorch.org/whl/rocm5.4.2"
            ;;
        3)
            echo "You selected Option 3"
            export TORCH_COMMAND="pip install torch torchvision --index-url https://download.pytorch.org/whl/rocm5.6"
            ;;
        4)
            echo "You selected Option 4"
            export TORCH_COMMAND="pip install torch torchvision --index-url https://download.pytorch.org/whl/rocm5.7"
            ;;
        5)
            echo "You selected Option 5"
            export TORCH_COMMAND="pip install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/rocm6.0"
            ;;
        6)
            echo "You selected Option 6"
            export TORCH_COMMAND="pip install torch==2.1.0 torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu; pip install torch_npu"
            ;;
        *)
            echo "Invalid choice. Please enter a number between 1 and 5"
            ;;
        esac
    }

    display_menu
    handle_input

elif [[ $gpu_brand == *NVIDIA* ]]; then
    echo "NVIDIA GPU detected."
else
    echo "Unable to identify GPU manufacturer."
    exit 1
fi

chiragkrishna avatar Feb 20 '24 15:02 chiragkrishna

I think giving an option for AMD owners to choose between old stable ROCm or latest and greatest would be the best. And if latest and greatest doesnt work, a simple arg or setting can be used to revert back. All I know is that the latest versions work horribly for my 6700xt and not sure why. But the latest version is required for the newer gen cards. I'm indifferent, I can install whatever version, its just the non-tech folks that would potentially run into issues.

Soulreaver90 avatar Feb 24 '24 11:02 Soulreaver90

i am using 6750xt, works almost similar with pytorch latest 5.7 and preview 6.0 also.

chiragkrishna avatar Feb 24 '24 11:02 chiragkrishna

i am using 6750xt, works almost similar with pytorch latest 5.7 and preview 6.0 also.

Interesting, I just tried the 6.0 preview with torch 2.3.0 and it seems to be alot better than 5.5-5.7 ever was. My initial generation takes awhile at first, but then it works. High.res on 5.5-5.7 would grind to a halt and I would get a OOM. This never happened to me on 5.4 with the same workflow. Tried 6.0 preview and while the first high.res pass was slow as molasses, it didn't OOM and the subsequent high.res generations worked just fine.

Currently on 2.3.0.dev20240222+rocm6.0

Update: ehh, played around with different resolutions and ran into OOM again. Downgraded back to 5.4.2 and everything is smooth as butter. Not sure if the issue is my card, rocm 5.5+ or high res in general.

Soulreaver90 avatar Feb 24 '24 11:02 Soulreaver90

for the initial generation problem, do this

wget https://raw.githubusercontent.com/wiki/ROCmSoftwarePlatform/pytorch/files/install_kdb_files_for_pytorch_wheels.sh

activate your venv

#Optional; replace 'gfx90a' with your architecture and 5.6 with your preferred ROCm version
export GFX_ARCH=gfx1030

#Optional rocm version
export ROCM_VERSION=5.7

./install_kdb_files_for_pytorch_wheels.sh

from Link

chiragkrishna avatar Feb 24 '24 13:02 chiragkrishna

for the initial generation problem, do this

wget https://raw.githubusercontent.com/wiki/ROCmSoftwarePlatform/pytorch/files/install_kdb_files_for_pytorch_wheels.sh

activate your venv

#Optional; replace 'gfx90a' with your architecture and 5.6 with your preferred ROCm version
export GFX_ARCH=gfx1030

#Optional rocm version
export ROCM_VERSION=5.7

./install_kdb_files_for_pytorch_wheels.sh

from Link

Nada, still runs like donkey butt. No idea why too. Anything above 5.4.2 runs slow or sends me to a OOM. I've been trying since 5.5, each time forcing me to go back down. Not sure if its pytorch that is the problem or the rocm build. I've tried matching OS rocm build with the pytorch build to no success. I am on 6.0.2 and tried 6.0 and, while abit better, runs into the same issues I've encountered with 5.5+.

Soulreaver90 avatar Feb 24 '24 14:02 Soulreaver90

i did a quick test with rocm5.4.2, rocm5.7 and rocm 6.0 GPU= 6750xt OS= linux mint 21.3 rocm driver version= 6.0.2 here are the results

torch2.0.1 rocm5.4.2

rocm5 4 2

torch2.2.1 rocm5.7

rocm5 7

torch2.3.0 rocm6.0

rocm6

as you can see, no diffrence

chiragkrishna avatar Feb 25 '24 02:02 chiragkrishna

@chiragkrishna

Here is a quick video between 5.4.2 and 5.7 on my machine. Pay attention to the mouse, I try to move it in both but you will see it stutter horribly on 5.7 and how it takes forever to upscale. This is with minimal chg,steps,prompts. Anything more complex leads to a OOM. No issue on 5.4.2. I had this bad result from 5.5 - 6.0. My OS was redone from scratch in Nov 23 and I had the same results before then.

Ubuntu 22.04. A1111 1.7. I will clone the latest RC and start fresh to see if something I have installed is breaking things, but I suspect not.

5.4.2 https://youtu.be/QxRpp9wL_Jk

5.7 https://youtu.be/aiM2obDWZHI

Soulreaver90 avatar Feb 25 '24 10:02 Soulreaver90

it is slow on both cases.

  1. try installing HWE kernel
sudo apt install linux-generic-hwe-22.04
  1. use only the rocm from amd stack. dont install graphics drivers
sudo amdgpu-install --usecase=rocm --no-dkms

chiragkrishna avatar Feb 25 '24 11:02 chiragkrishna

it is slow on both cases.

1. try installing HWE kernel
sudo apt install linux-generic-hwe-22.04
2. use only the rocm from amd stack. dont install graphics drivers
sudo amdgpu-install --usecase=rocm --no-dkms

Both already installed and configured as described. Update Tried a fresh install of A1111 with 5.7 out the gate, same issues. Tried another browser, same result. Regular gens are "fine", but larger resolutions or hires upscales are horrible. No idea what's wrong but Ill just stay on 5.4.2 until I get a new card I guess.

Soulreaver90 avatar Feb 25 '24 11:02 Soulreaver90

@Soulreaver90 Not sure about this one, but the exact same issues happen to me on Windows. Even if I use ZLUDA, or the normal DirectML way, the exact same issues happen. I don't have that issue on Linux with ROCm though.

From what I can tell on Windows the VRAM isn't freed up unless I quit the overall process (not just SD, but the whole terminal needs to be closed), which means that the VRAM is basically full after one generation and then almost everything runs through the shared memory. But I'm not sure if that's the actual issue, or just the manifestation of something. I definitely did notice that the same exact parameters take up much more space, and I've actually run out of RAM on Windows (32GB), while Linux is completely fine.

Either way, maybe you should try with Windows, and you have the exact opposite experience from me 😆

L3tum avatar Feb 25 '24 13:02 L3tum

Both already installed and configured as described. Update Tried a fresh install of A1111 with 5.7 out the gate, same issues. Tried another browser, same result. Regular gens are "fine", but larger resolutions or hires upscales are horrible. No idea what's wrong but Ill just stay on 5.4.2 until I get a new card I guess.

I've been running into the same issues, but with slightly different versions. I was running 5.6 fine, I made a bunch of changes at once (stupid I know), one of which was going to 5.7 and I've had these OOM/HiResFix issues and lower res/batch limits for about a week. So you've confirmed that rolling back to 5.4 fixed these issues for you? I've been thinking about rolling it back, but figured maybe I broke something else so hadn't messed with that yet since normal gens and upscales were 'fine'-ish. [7800XT]

Symber13 avatar Feb 26 '24 05:02 Symber13

Both already installed and configured as described. Update Tried a fresh install of A1111 with 5.7 out the gate, same issues. Tried another browser, same result. Regular gens are "fine", but larger resolutions or hires upscales are horrible. No idea what's wrong but Ill just stay on 5.4.2 until I get a new card I guess.

I've been running into the same issues, but with slightly different versions. I was running 5.6 fine, I made a bunch of changes at once (stupid I know), one of which was going to 5.7 and I've had these OOM/HiResFix issues and lower res/batch limits for about a week. So you've confirmed that rolling back to 5.4 fixed these issues for you? I've been thinking about rolling it back, but figured maybe I broke something else so hadn't messed with that yet since normal gens and upscales were 'fine'-ish. [7800XT]

You wouldn’t be able to roll back to 5.4.2 because the 7000 series cards require ROCm 5.5 at minimum. But I’m curious if there is some setting or configuration that might be breaking highres.

Soulreaver90 avatar Feb 26 '24 05:02 Soulreaver90

You wouldn’t be able to roll back to 5.4.2 because the 7000 series cards require ROCm 5.5 at minimum. But I’m curious if there is some setting or configuration that might be breaking highres.

I rolled back to 5.6, which is what I previously had working well, but no luck. Still seeing the issue. I don't think its specifically HiRes though, not exclusively. My basic initial generation size decreased, Tiled Diffusion also won't let me upscale past that size first step. It seems like something greatly increased the VRAM getting used and/or reduced the sizes I can generate (in my first step before upscaling).

Symber13 avatar Feb 26 '24 07:02 Symber13

Well what an odd turn of events. I updated to WebUI 1.8.0 and decided to try pytorch 2.2.1+rocm5.7 ... and it seems to be working now? At first it stuttered a bit doing hires.fix, but after I terminated and relaunched Webui, everything seems to run just fine. I do run into oom a bit more often at odd or higher resolutions, but it works half of the time. It's a bit of a trade off but it otherwise works.

Soulreaver90 avatar Mar 03 '24 03:03 Soulreaver90

Well what an odd turn of events. I updated to WebUI 1.8.0 and decided to try pytorch 2.2.1+rocm5.7 ... and it seems to be working now? At first it stuttered a bit doing hires.fix, but after I terminated and relaunched Webui, everything seems to run just fine. I do run into oom a bit more often at odd or higher resolutions, but it works half of the time. It's a bit of a trade off but it otherwise works.

Thanks for posting this! I likely would have gotten around to it eventually, I've been tinkering a little now and then (with no luck) every day or two, but popped it open as soon as I noticed your post. Pulled 1.8, did a fresh uninstall/reinstall of rocm just to be extra careful and BOOM I can use HiResFix again!

I haven't fully tested out the limits yet. I want to see if I render back at my old resolutions, but as things stand I'm at least able to HiResFix at 2x (default) at normal speeds. Previously, with the issue, it would bog at 1.85 (the highest it was going without OOM), and had to be as low as 1.7 for normal speed results.

Symber13 avatar Mar 03 '24 15:03 Symber13

Will merge this into dev tomorrow if there are no objections.

AUTOMATIC1111 avatar Mar 03 '24 16:03 AUTOMATIC1111