stable-diffusion-webui icon indicating copy to clipboard operation
stable-diffusion-webui copied to clipboard

[Bug]: Stuck ”DiffusionWrapper has 859.52 M params.“ running on with Radeon 5600XT(Ubuntu)

Open 333377777 opened this issue 2 years ago • 1 comments

Is there an existing issue for this?

  • [X] I have searched the existing issues and checked the recent builds/commits

What happened?

Hello: I'm tryin to run an Ubuntu 22.04.5 machine with a Radeon 5600xt GPU (6 GB), a Ryzen 5 5600 CPU, and 32 GB of RAM. I followed the instructions configured

When I try to start the UI, Stuck ”DiffusionWrapper has 859.52 M params & GPU will become 100%

Executed code source venv/bin/activate export HSA_OVERRIDE_GFX_VERSION=10.3.0 TORCH_COMMAND='pip install torch torchvision --extra-index-url https://download.pytorch.org/whl/rocm5.1.1' python launch.py --precision full --no-half

Own test results :

  1. It's not a network problem
  2. Tried rcom:5.1.1&5.2 →GPU 100%
  3. Tried VGA Driver:AMD 22.40&22.20 for Ubuntu 20.04.5 →GPU 100%
  4. Tried Suffix:--xformers &--medvram →GPU 100%
  5. VGA Driver set is Ok , Rocm checked with rocm-smi sudo /opt/rocm/bin/rocminfo sudo/opt/rocm/opencl/bin/clinfo
  6. If do not execute code export HSA_OVERRIDE_GFX_VERSION=10.3.0 & skip The cuda check The UI can be work, but AI calculation is performed by CPU instead of GPU, which is very slow.

Some supplementary descriptions:GPU 100% about 1 minute after operation,The GPU cannot be released even after the process ends,Repeated operation will cause a crash,must restart my PC.

I asked some people who had similar situations with me. Most of them were Radeon 5000 series

Thank you!

Steps to reproduce the problem

  1. Set up as instructed in the wiki
  2. Start the UI with the command provided in the wik
  3. ...

What should have happened?

UI should start up.

Commit where the problem happens

none, there are no errors,Just stuck&GPU100% and cannot be released

What platforms do you use to access the UI ?

Linux

What browsers do you use to access the UI ?

No response

Command Line Arguments

`TORCH_COMMAND=python launch.py --precision full --no-half `

List of extensions

No

Console logs

none, there are no errors in the console

Additional information

No response

333377777 avatar Feb 18 '23 04:02 333377777

same but on ubuntu on aws with nvidia drivers. Works fine locally, idk why it does this

DevL0rd avatar Feb 28 '23 16:02 DevL0rd

Did you ever get this sorted out? This problem suddenly popped up today. Was fine till yesterday. Tried reinstalling. But no luck.

regstuff avatar Mar 31 '23 10:03 regstuff

I have the exact same problem today. I reinstalled everything from scratch, nothing helps, I'm stuck at the same message.

Gpu doesn't get to 100% though, it doesn't load anything and is asleep.

Tried to use with --use-cpu, same problem

kriimakt avatar Apr 05 '23 15:04 kriimakt

I have the exact same problem today. I reinstalled everything from scratch, nothing helps, I'm stuck at the same message.

Gpu doesn't get to 100% though, it doesn't load anything and is asleep.

Tried to use with --use-cpu, same problem

I sorted this by resetting the GPU.

regstuff avatar Apr 05 '23 16:04 regstuff

I sorted this by resetting the GPU

what do you mean "resseting the gpu"

steveruu avatar Apr 05 '23 16:04 steveruu

I have the same issue, running webui-user.bat progresses to that line and then halts. Nothing further happens, no errors reported. UI never comes up. I'll note that my GPU doesn't idle at 100% like the original reporter's does but otherwise identical.

Python 3.10.6 (tags/v3.10.6:9c7b4bd, Aug  1 2022, 21:53:49) [MSC v.1932 64 bit (AMD64)]
Commit hash: 22bcc7be428c94e9408f589966c2040187245d81
Installing requirements for Web UI
Launching Web UI with arguments:
No module 'xformers'. Proceeding without it.
Loading weights [6ce0161689] from D:\stable-diffusion-webui\models\Stable-diffusion\v1-5-pruned-emaonly.safetensors
Creating model from config: D:\stable-diffusion-webui\configs\v1-inference.yaml
LatentDiffusion: Running in eps-prediction mode
DiffusionWrapper has 859.52 M params.

TickTockBent avatar Apr 05 '23 16:04 TickTockBent

Have anyone find out how to fix ?

DarkNmaster avatar Apr 05 '23 16:04 DarkNmaster

does the same on Windows with NVIDIA drivers lol

steveruu avatar Apr 05 '23 16:04 steveruu

Updated my NVIDIA drivers to 531.41 as a test, no fix

TickTockBent avatar Apr 05 '23 16:04 TickTockBent

Same problem for me today, Nvidia 531.41 Windows 10. Stuck to DiffusionWrapper

> `venv "D:\SD\stable-diffusion-webui\venv\Scripts\Python.exe"
> Python 3.10.6 (tags/v3.10.6:9c7b4bd, Aug  1 2022, 21:53:49) [MSC v.1932 64 bit (AMD64)]
> Commit hash: <none>
> Installing requirements for Web UI
> 
> Installing imageio-ffmpeg requirement for depthmap script
> Installing pyqt5 requirement for depthmap script
> 
> 
> Launching Web UI with arguments: --xformers
> Loading weights [9aba26abdf] from D:\SD\stable-diffusion-webui\models\Stable-diffusion\deliberate_v2.safetensors
> Creating model from config: D:\SD\stable-diffusion-webui\configs\v1-inference.yaml
> LatentDiffusion: Running in eps-prediction mode
> DiffusionWrapper has 859.52 M params.`

Slevin9120 avatar Apr 05 '23 16:04 Slevin9120

I sorted this by resetting the GPU

what do you mean "resseting the gpu"

I've only done it on an AMD GPU. Here's chatgpt on doing it for nvidia stuff. 13ng2xu

regstuff avatar Apr 05 '23 16:04 regstuff

Do you guy think it bug or it just problem with us PC ?

DarkNmaster avatar Apr 05 '23 17:04 DarkNmaster

Updated my NVIDIA drivers to 531.41 as a test, no fix

Almost certainly a bug. I have other installs of stable diffusion web UI working fine

TickTockBent avatar Apr 05 '23 17:04 TickTockBent

This is caused by stable-diffusion-webui\venv\Lib\site-packages\huggingface_hub\file_download.py

    with FileLock(lock_path):
        # If the download just completed while the lock was activated.
        if os.path.exists(pointer_path) and not force_download:
            # Even if returning early like here, the lock will be released.
            return pointer_path

The problem is lock_path can have illegal character such as "

Just replace them, then everything goes well.

kinamoe avatar Apr 05 '23 17:04 kinamoe

By the way, the lock_path is calculated with etag, the real problem is etag can have illegal characters.

    # From now on, etag and commit_hash are not None.
    assert etag is not None, "etag must have been retrieved from server"
    assert commit_hash is not None, "commit_hash must have been retrieved from server"
    blob_path = os.path.join(storage_folder, "blobs", etag)
    pointer_path = os.path.join(storage_folder, "snapshots", commit_hash, relative_filename)
    # Prevent parallel downloads of the same file with a lock.
    lock_path = blob_path + ".lock"

kinamoe avatar Apr 05 '23 17:04 kinamoe

Just replace them, then everything goes well.

What exactly should be replaced?

steveruu avatar Apr 05 '23 17:04 steveruu

Just replace them, then everything goes well.

What exactly should be replaced?

You can print the etag by insert print(etag) and see the position of the illegal character. You can just drop them.

For me, I solved the problem by changing the code to:

    # From now on, etag and commit_hash are not None.
    assert etag is not None, "etag must have been retrieved from server"
    assert commit_hash is not None, "commit_hash must have been retrieved from server"
    blob_path = os.path.join(storage_folder, "blobs", etag[3:])
    pointer_path = os.path.join(storage_folder, "snapshots", commit_hash, relative_filename)

What I change is etag[3:], since I found the first 3 character of etag is illegal, so I drop them.

kinamoe avatar Apr 05 '23 17:04 kinamoe

What I change is etag[3:], since I found the first 3 character of etag is illegal, so I drop them.

oh my god, you're a genius; it fixed it for me!

steveruu avatar Apr 05 '23 17:04 steveruu

Thanks @kinamoe that was very helpful. For others, the line in question is line 1262 of /venv/Lib/site-packages/huggingface_hub/file_download.py

For some reason my etag was including some extra junk at the start.

TickTockBent avatar Apr 05 '23 17:04 TickTockBent

Just replace them, then everything goes well.

What exactly should be replaced?

You can print the etag by insert print(etag) and see the position of the illegal character. You can just drop them.

For me, I solved the problem by changing the code to:

    # From now on, etag and commit_hash are not None.
    assert etag is not None, "etag must have been retrieved from server"
    assert commit_hash is not None, "commit_hash must have been retrieved from server"
    blob_path = os.path.join(storage_folder, "blobs", etag[3:])
    pointer_path = os.path.join(storage_folder, "snapshots", commit_hash, relative_filename)

What I change is etag[3:], since I found the first 3 character of etag is illegal, so I drop them.

This worked also for me! Thank you so much for the fast fix!

Slevin9120 avatar Apr 05 '23 17:04 Slevin9120

in my case, I had the same problem, but I had to add the [3:] to the etag because I didn't have it. after i added that [3:] to the etag it's working fine blob_path = os.path.join(storage_folder, "blobs", etag[3:])

ZeiZeiArt avatar Apr 05 '23 17:04 ZeiZeiArt

Just replace them, then everything goes well.

What exactly should be replaced?

You can print the etag by insert print(etag) and see the position of the illegal character. You can just drop them.

For me, I solved the problem by changing the code to:

    # From now on, etag and commit_hash are not None.
    assert etag is not None, "etag must have been retrieved from server"
    assert commit_hash is not None, "commit_hash must have been retrieved from server"
    blob_path = os.path.join(storage_folder, "blobs", etag[3:])
    pointer_path = os.path.join(storage_folder, "snapshots", commit_hash, relative_filename)

What I change is etag[3:], since I found the first 3 character of etag is illegal, so I drop them.

It works to me!

liangchaob avatar Apr 05 '23 17:04 liangchaob

Seems this bug happened recent days?

XiaosongLin avatar Apr 05 '23 17:04 XiaosongLin

Same problem here, fixed by applying @kinamoe 's solution. Thank you very much.

Rkyzzy avatar Apr 05 '23 18:04 Rkyzzy

Just replace them, then everything goes well.

What exactly should be replaced?

You can print the etag by insert print(etag) and see the position of the illegal character. You can just drop them. For me, I solved the problem by changing the code to:

    # From now on, etag and commit_hash are not None.
    assert etag is not None, "etag must have been retrieved from server"
    assert commit_hash is not None, "commit_hash must have been retrieved from server"
    blob_path = os.path.join(storage_folder, "blobs", etag[3:])
    pointer_path = os.path.join(storage_folder, "snapshots", commit_hash, relative_filename)

What I change is etag[3:], since I found the first 3 character of etag is illegal, so I drop them.

It works to me!

you guys are awesome, thank you for finding the fix, god bless you all.

mllsmr10 avatar Apr 05 '23 18:04 mllsmr10

Just replace them, then everything goes well.

What exactly should be replaced?

You can print the etag by insert print(etag) and see the position of the illegal character. You can just drop them.

For me, I solved the problem by changing the code to:

    # From now on, etag and commit_hash are not None.
    assert etag is not None, "etag must have been retrieved from server"
    assert commit_hash is not None, "commit_hash must have been retrieved from server"
    blob_path = os.path.join(storage_folder, "blobs", etag[3:])
    pointer_path = os.path.join(storage_folder, "snapshots", commit_hash, relative_filename)

What I change is etag[3:], since I found the first 3 character of etag is illegal, so I drop them.

Sorry I am kind of new to this and am having the same problem, what do you mean by printeag to find illegal characters and am I supposed to delete the code and replace it? If so which parts of it or do I just put it in somewhere? Still learning coding.

TheVexing avatar Apr 05 '23 18:04 TheVexing

Just came across this myself, as a first time user. You guys are awesome.

blob_path = os.path.join(storage_folder, "blobs", etag) to blob_path = os.path.join(storage_folder, "blobs", etag[3:])

SoaringMoon avatar Apr 05 '23 19:04 SoaringMoon

TO SUM UP

  • Go to: <INSTALL_FOLDER>\stable-diffusion-webui\venv\Lib\site-packages\huggingface_hub\file_download.py
  • Open in your fav IDE
  • Jump to line 1262 (As of the time of writing this)
  • Edit the line to be as per above.

image

Alipoodle avatar Apr 05 '23 19:04 Alipoodle

What an absolute legend Thank you! Worked for me!!!

Just replace them, then everything goes well.

What exactly should be replaced?

You can print the etag by insert print(etag) and see the position of the illegal character. You can just drop them.

For me, I solved the problem by changing the code to:

    # From now on, etag and commit_hash are not None.
    assert etag is not None, "etag must have been retrieved from server"
    assert commit_hash is not None, "commit_hash must have been retrieved from server"
    blob_path = os.path.join(storage_folder, "blobs", etag[3:])
    pointer_path = os.path.join(storage_folder, "snapshots", commit_hash, relative_filename)

What I change is etag[3:], since I found the first 3 character of etag is illegal, so I drop them.

Married-CRJ avatar Apr 05 '23 19:04 Married-CRJ

Just replace them, then everything goes well.

What exactly should be replaced?

You can print the etag by insert print(etag) and see the position of the illegal character. You can just drop them.

For me, I solved the problem by changing the code to:

    # From now on, etag and commit_hash are not None.
    assert etag is not None, "etag must have been retrieved from server"
    assert commit_hash is not None, "commit_hash must have been retrieved from server"
    blob_path = os.path.join(storage_folder, "blobs", etag[3:])
    pointer_path = os.path.join(storage_folder, "snapshots", commit_hash, relative_filename)

What I change is etag[3:], since I found the first 3 character of etag is illegal, so I drop them.

hero

remyatjak avatar Apr 05 '23 19:04 remyatjak