stable-diffusion-webui update torch base environment

update torch base environment

Open vladmandic opened this issue 1 year ago • 29 comments

this pr is a single-step update of pytorch base environment:

from torch 1.13.1 with cuda 11.7 and cudnn 8.5.0
to torch 2.0.0 with cuda 11.8 and cudnn 8.7.0

this allows usage of sdp cross-optimization, better multi-gpu support with accelerate and avoids large number of performance issues due to broken cudnn in some environments

it updates all required packaged, but avoids any prereleases:

torchvision (plus silence future deprecation warning)
xformers (update follows torch)
accelerate (required to support new torch)
numpy (update of numpy is required by new accelerate)

note:

since accelerate changed format of the config file to avoid warnings (non-critical), run accelerate config once
collab updated to torch 2.0 so having webui still using older torch is causing issues for users running webui in hosted environments

yes, updating torch is a major step, but will have to be done sooner or later as there are more and more reports of issues installing old torch version

Mar 30 '23 15:03 vladmandic

Testing these changes out - things seem to work "out of the box",
but I still get the No module 'xformers'. Proceeding without it message when starting up Not sure if this can be ignored, as it's seemingly included in torch 2

Mar 30 '23 20:03 glass-ships

but I still get the No module 'xformers'. Proceeding without it message when starting up

its not "included", its just not necessary given new sdp is available
(depending on the use-case, low-end gpus are still better with xformers).

the remaining message comes from external repo - repositories/stable-diffusion-stability-ai and removing warning would cause that repo to get out-of-sync. and unfortunately, its not posted with logger so it can be filtered out, but a simple print statement.

Mar 30 '23 21:03 vladmandic

have you tested these changes on unix? runpod?

Mar 30 '23 21:03 FurkanGozukara

runpod

linux yes. runpod no. there are thousands of gpu cloud providers, cannot test each one like that.

Mar 30 '23 21:03 vladmandic

Yes, at some point will have to migrate to torch 2.0 since newer xformer wheels require it.

Mar 30 '23 22:03 drax-xard

runpod

linux yes. runpod no. there are thousands of gpu cloud providers, cannot test each one like that.

ok list me 20 :)

anyway i am just saying that covering as many as possible widely used scenario is good

Yes, at some point will have to migrate to torch 2.0 since newer xformer wheels require it.

correct and i solved this problem with downloading and uploading torch 1 version wheel 0.0.18dev489. they are also still compiling them thankfully. i think automatic1111 can do same way. the wheel and such things can be hosted on hugging face i think . currently they removed all 0.0.14 and 0.0.17 for torch 1 from pip installation.

Mar 30 '23 23:03 FurkanGozukara

Yes, at some point will have to migrate to torch 2.0 since newer xformer wheels require it.

This is true only for wheels posted to pypi. You can find a wide range of pre-built xformers wheel builds in their Github action artifacts, if you still needed a wheel for older torch. Not at simple as keeping up to date via pypi, but useful in a pinch. Screenshot example of available builds below: xformersartifacts

Just keep in mind you need to be logged into Github to download artifacts.

Mar 31 '23 00:03 Cyberbeing

Yes, at some point will have to migrate to torch 2.0 since newer xformer wheels require it.

Just keep in mind you need to be logged into Github to download artifacts.

sorry, can we use discussions for this and keep pr comments as pr comments? i'd love to collect/implement anything thats required, but this is not pr related at all.

Mar 31 '23 00:03 vladmandic

I'm running into a really strange problem. Any advice of how I should narrow down the root cause?

Edit: Oops, forgot to say my startup arguments: --xformers --opt-channelslast --no-half-vae

I was just trying this PR out (as-is, plus exporting xformers==0.0.18 in launch.py).

Everything upgraded and ran smoothly for the most part, but when I tried to generate a larger image (e.g. 1024x1024), I realized there is a problem for me -- It seemed to hang at 100% GPU for 4 minutes! The sampler steps had completed, but the image had not saved to file yet. Then after 4 mins, the image finally completed and saved.

After this, the problem goes away if I generate more at the same resolution (until the WebUI process is restarted).

However, if I change the image resolution to anything different, e.g. to 1024x1088, the same very delay happens, and again only for the first run at that resolution.

After investigating, I realized there were also delays for smaller images, but the delay grows exponentially as resolution scales up.

Here is a quick table showing times I measured. Note: These measurements also reflect the typical 'warm-up' time before steps progress that was already a little annoying. After warm-up, the gen time for small images is much faster.

First Runs w/ 5 steps (Euler a):

Gen Time = time for all steps completed

Size	Gen Time	Total Time
512x512	0:03	0:04
640x640	0:01	0:06
704x704	0:02	0:08
768x768	0:02	0:09
832x832	0:03	0:11
896x832	0:03	0:12
896x896	0:03	2:23
1024x1024	0:04	4:14

Here are some screenshots of the output I saw RBCzcymn55

Qmb5T9SGTj

fk189qPC4L

System: Win10, RTX 2070S 8GB, Intel 3770k

Version info: KGJz8vHocu

Apr 03 '23 02:04 EfourC

@EfourC Very strange problem, since the recent update is not stable, you can try to see if this problem is reproduced on the old version, for example git checkout a9fed7c.

Apr 03 '23 03:04 Sakura-Luna

With that commit, and the current master, the problem doesn't happen -- I only see it after everything is upgraded for Torch2.

Behavior Ok: DiMNEWHhQP MGZ5Fxp7GT

Apr 03 '23 04:04 EfourC

I did some more permutations of testing, especially to see if --opt-sdp-attention instead of xformers made a difference with the Torch 2 venv.

What I found out is that the problem is actually --opt-channelslast causing the massive delay for me in with Torch 2.

Both of these startup args work ok: --xformers --no-half-vae --opt-sdp-attention --no-half-vae

Using --opt-channelslast with either of the above creates the problem delay for me.

I haven't looked at (or used previously) any of the other performance optimization switches, but it's probably worth people trying them out on different types of systems (since I blundered into an issue with this one). https://github.com/AUTOMATIC1111/stable-diffusion-webui/wiki/Command-Line-Arguments-and-Settings

Apr 03 '23 04:04 EfourC

Honestly if there is going to be a move to Torch 2.0.0, it should wait until after Torch 2.0.1 is released as there is currently a major bug that made it into GA that breaks compatibility with WebUI when using torch.compile. See: https://github.com/pytorch/pytorch/pull/97862 and https://github.com/pytorch/pytorch/issues/93405

Apr 04 '23 15:04 mariaWitch

I'm aware of that issue, but WebUI does not use torch.compile on its own and anyone that is experienced enough to use it would hand pick torch version manually anyhow.

Torch 2.1 has no benefits for normal WebUI user. And existing Torch 1.13 is showing it's teeth with quite a few install issues lately.

Whole point of the PR is not to enable experimental use, but to make it simpler for normal users.

Apr 04 '23 16:04 vladmandic

fyi, i've initially updated xformers to 0.0.18, but there are frequent reports of NaN values, especially during hires operations, so i've downgraded to 0.0.17. performance wise, i don't see any major difference, so this is not a big loss. like i said before, goal of this PR is to get cleanest out-of-the-box environment where least number of users have issues, not just go with latest&greatest.

Apr 04 '23 21:04 vladmandic

Torch 2.1 has no benefits for normal WebUI user. And existing Torch 1.13 is showing it's teeth with quite a few install issues lately.

Whole point of the PR is not to enable experimental use, but to make it simpler for normal users.

It isn't 2.1, we aren't waiting a whole major release, Torch 2.0.1 came out of phase 0 yesterday. I still believe that Torch 2.0 should not be merged until the blocking issue upstream is resolved in the next minor update as I believe PyTorch botched the initial GA release of 2.0, and we shouldn't be running that version of Pytorch until it is more mature.

Apr 04 '23 21:04 mariaWitch

Torch 2.1 has no benefits for normal WebUI user. And existing Torch 1.13 is showing it's teeth with quite a few install issues lately. Whole point of the PR is not to enable experimental use, but to make it simpler for normal users.

It isn't 2.1, we aren't waiting a whole major release, Torch 2.0.1 came out of phase 0 yesterday. I still believe that Torch 2.0 should not be merged until the blocking issue upstream is resolved in the next minor update as I believe PyTorch botched the initial GA release of 2.0, and we shouldn't be running that version of Pytorch until it is more mature.

and then we'd have to wait for xformers to publish new wheels, etc... again, torch.compile is not used by webui so there is no benefit standard user of waiting for torch 2.0.1. and collab upgraded to torch 2.0 and so did many other hosted environments, so right now running webui requires additional manual steps - which is far more important to resolve than "wait for ideal version".

Apr 05 '23 14:04 vladmandic

and then we'd have to wait for xformers to publish new wheels, etc... again, torch.compile is not used by webui so there is no benefit standard user of waiting for torch 2.0.1. and collab upgraded to torch 2.0 and so did many other hosted environments, so right now running webui requires additional manual steps - which is far more important to resolve than "wait for ideal version".

As it stands right now, the only people you are claiming are effected are people using cloud setups, which most likely have already done the manual work to support PyTorch 2.0.0. There is no reason for PyTorch to be upgraded to 2.0.0 when it is very clearly NOT stable. It is not worth risking adding even more bugs to the code base as it currently stands.

Apr 05 '23 16:04 mariaWitch

very clearly NOT stable

that is a very strong statement. can you substantiate this? all errors i've seen so far have been related to torch.compile and yes, that feature is pretty much broken.

on the other hand, there are hundreds users using torch 2.0 with webui without issues.

Apr 05 '23 17:04 vladmandic

that is a very strong statement. can you substantiate this?

To name a few: https://github.com/pytorch/pytorch/issues/97031 https://github.com/pytorch/pytorch/issues/97041 https://github.com/pytorch/pytorch/issues/97226 https://github.com/pytorch/pytorch/issues/97576 https://github.com/pytorch/pytorch/issues/97021

And not only that, I disagree with moving to 2.0.0 on principle as .0.0 software is generally never stable. Waiting for 2.0.1 has no downsides whereas 2.0.0 is an unstable mess that they are still trying to get stable. The last thing this repo needs is more instability which causes more issues to flood in.

Apr 06 '23 14:04 mariaWitch

that is a very strong statement. can you substantiate this?

To name a few: pytorch/pytorch#97031 pytorch/pytorch#97041 pytorch/pytorch#97226 pytorch/pytorch#97576 pytorch/pytorch#97021

Bugs relevant to WebUI are what matters - why list random things - this is going in a wrong direction?

For example, I don't think anyone is trying to run it on RaspberryPi, lets stay on topic?
And WebUI uses venv, not conda, so conda on osx-64 also doesn't really apply.
Or does torchvision have debug symbols or not? I'd consider that cosmetic issue at best.
Yes, libnvrtc packaging bug is relevant, but there is no PR associated with it, so should we wait until indefinitely?

And not only that, I disagree with moving to 2.0.0 on principle as .0.0 software is generally never stable. Waiting for 2.0.1 or even 2.0.2 has no downsides whereas 2.0.0 is an unstable mess that they are still trying to get stable. The last thing this repo needs is more instability which causes more issues to flood in.

That is a question of personal preference and risk vs reward. Issue is that Torch 1.13 wheels are getting obsoleted in many packages and/or environment causing failures to install. So whats the solution? Ignore current issues until some unknown time in the future?

PR stands as-is and I've been using Torch 2.0 on my branch for a while now (and users on my branch are not reporting issues relevant to WebUI). We can agree to disagree here.

Apr 06 '23 14:04 vladmandic

Convolutions being broken for Cuda 11.8 builds specifically affects users who use Pytorch 2.0 and --opt-channelslast. It basically negates any possible performance benefits from that option.

Apr 07 '23 14:04 mariaWitch

Recently PyTorch changed it's install command, it now uses --index-url instead of --extra-index-url, as mentioned in #9483

Also, i noticed your code doesn't cover AMD cards (in that cate TORCH_COMMAND is set in webui.sh But it's fine, i had some problems on my 5700XT on pytorch2 (see #8139) and already covered that part in #9404, i feel it's better to stay on 1.31.1 a few more, at least on AMD.

Apr 11 '23 10:04 DGdev91

Been having a lot of issues trying to get things working with my 5700XT. The new torch 2.0 version failed to generate any images.

Using the latest rocm as below failed to generate images or any console output: export TORCH_COMMAND="pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm5.4.2"

Using the torch command from the pr works: export TORCH_COMMAND="pip install torch==1.13.1+rocm5.2 torchvision==0.14.1+rocm5.2 --index-url https://download.pytorch.org/whl/rocm5.2"

At least it is working on Commit hash: a9fed7c364061ae6efb37f797b6b522cb3cf7aa2

Apr 16 '23 16:04 sjdevries

rocm 5.6.0 alpha is out and it brings torch 2.0 compatibility, i'd be curious if that works.

Apr 16 '23 16:04 vladmandic

5.6.0？maybe that will contain my 7900xtx

Apr 16 '23 16:04 PennyFranklin

rocm 5.6.0 alpha is out and it brings torch 2.0 compatibility, i'd be curious if that works.

Really? Where? I see 5.4.3 as the last realease on https://github.com/RadeonOpenCompute/ROCm/releases

Apr 17 '23 14:04 DGdev91

rocm 5.6.0 alpha is out and it brings torch 2.0 compatibility, i'd be curious if that works.

Really? Where? I see 5.4.3 as the last realease on https://github.com/RadeonOpenCompute/ROCm/releases

https://rocmdocs.amd.com/projects/alpha/en/develop/deploy/install.html

Apr 17 '23 15:04 vladmandic

rocm 5.6.0 alpha is out and it brings torch 2.0 compatibility, i'd be curious if that works.

Really? Where? I see 5.4.3 as the last realease on https://github.com/RadeonOpenCompute/ROCm/releases

https://rocmdocs.amd.com/projects/alpha/en/develop/deploy/install.html

uhm... it doen't seem that's publicly available

5.6.0？maybe that will contain my 7900xtx

There was indeed a docker image for rocm 5.6.0 with 7900xtx support around, but it's now offline so i guess that code was intended for internal testing and not supposed to be released yet. Anyway, there was a discussion here #9591

I'm not sure if that works on other gpus like the 5700xt too. but i wouldn't be surprised if pytorch2.0 starts to work when the next rocm version will be released.

I guess for 5700xt users the better choice is sticking to the old 1.13.1 version and wait for an official rocm release

Apr 18 '23 23:04 DGdev91

I'm pretty sure --extra-index-url https://download.pytorch.org/whl/cu118 for OSX is wrong but I don't have a mac to try it on.

Apr 29 '23 09:04 AUTOMATIC1111

stable-diffusion-webui stable-diffusion-webui copied to clipboard

update torch base environment

stable-diffusion-webui
stable-diffusion-webui copied to clipboard