stable-diffusion-webui
stable-diffusion-webui copied to clipboard
Add upcast options, full precision sampling from float16 UNet and upcasting attention for inference using SD 2.1 models without --no-half
Detailed list of changes:
- Add
--upcast_sampling(uses a monkey patch toapply_modelto downcast/upcast between the UNet and a monkey patch totimestep_embeddingto fix the dtype) - Add
--upcast-attnoption so that inference works with SD 2.1 models with a float16 UNet (adds upcasting of q and k to torch.float32 for each cross attention layer optimization) - Add support for using
--upcast_samplingwith older PyTorch versions (this accounts for the other 4 of the 6 monkey patches) - Add credit to
README.md
Thanks to marunine for the idea of sampling in full precision from a float16 UNet and Birch-san for the example implementation for Diffusers.
Also see: https://twitter.com/Birchlabs/status/1599903883278663681
This allows for using a float16 UNet with sampling in float32. Increases speed and decreases memory usage on some hardware that doesn't otherwise work without --no-half.
To use the changes in this PR, run with --precision upcast and without --no-half.
Edit: Added --upcast-attn option, with this option it should be possible to generate images using a SD 2.1 model without the --no-half option.
This PR is tested and working on macOS (PyTorch MPS device). If you try this on ROCm or CUDA, please report if it is working for you or not. If it is not working for you, please also report your PyTorch version and Traceback or error message (if applicable).
Doesn't seem to work right now, tested on RX 5500 with ROCm.
Traceback (most recent call last):
File "/home/d/webui/modules/call_queue.py", line 45, in f
res = list(func(*args, **kwargs))
File "/home/d/webui/modules/call_queue.py", line 28, in f
res = func(*args, **kwargs)
File "/home/d/webui/modules/txt2img.py", line 52, in txt2img
processed = process_images(p)
File "/home/d/webui/modules/processing.py", line 479, in process_images
res = process_images_inner(p)
File "/home/d/webui/modules/processing.py", line 597, in process_images_inner
uc = get_conds_with_caching(prompt_parser.get_learned_conditioning, negative_prompts, p.steps, cached_uc)
File "/home/d/webui/modules/processing.py", line 565, in get_conds_with_caching
cache[1] = function(shared.sd_model, required_prompts, steps)
File "/home/d/webui/modules/prompt_parser.py", line 138, in get_learned_conditioning
conds = model.get_learned_conditioning(texts)
File "/home/d/webui/repositories/stable-diffusion-stability-ai/ldm/models/diffusion/ddpm.py", line 669, in get_learned_conditioning
c = self.cond_stage_model(c)
File "/home/d/webui/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/home/d/webui/modules/sd_hijack_clip.py", line 220, in forward
z = self.process_tokens(tokens, multipliers)
File "/home/d/webui/modules/sd_hijack_clip.py", line 245, in process_tokens
z = self.encode_with_transformers(tokens)
File "/home/d/webui/modules/sd_hijack_clip.py", line 293, in encode_with_transformers
outputs = self.wrapped.transformer(input_ids=tokens, output_hidden_states=-opts.CLIP_stop_at_last_layers)
File "/home/d/webui/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1148, in _call_impl
result = forward_call(*input, **kwargs)
File "/home/d/webui/venv/lib/python3.10/site-packages/transformers/models/clip/modeling_clip.py", line 722, in forward
return self.text_model(
File "/home/d/webui/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/home/d/webui/venv/lib/python3.10/site-packages/transformers/models/clip/modeling_clip.py", line 643, in forward
encoder_outputs = self.encoder(
File "/home/d/webui/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/home/d/webui/venv/lib/python3.10/site-packages/transformers/models/clip/modeling_clip.py", line 574, in forward
layer_outputs = encoder_layer(
File "/home/d/webui/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/home/d/webui/venv/lib/python3.10/site-packages/transformers/models/clip/modeling_clip.py", line 317, in forward
hidden_states, attn_weights = self.self_attn(
File "/home/d/webui/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/home/d/webui/venv/lib/python3.10/site-packages/transformers/models/clip/modeling_clip.py", line 257, in forward
attn_output = torch.bmm(attn_probs, value_states)
RuntimeError: expected scalar type Half but found Float
COMMANDLINE_ARGS are --precision upcast --medvram --opt-split-attention-v1 --always-batch-cond-uncond --deepdanbooru.
Also on rocm. should it work with rocm?
with new option --upcastattn i am getting error (full command --opt-split-attention --opt-channelslast --medvram --precision upcast --upcastattn --opt-sub-quad-attention)
Traceback
Traceback (most recent call last):
File "/tmp/stable-diffusion-webui/modules/call_queue.py", line 45, in f
res = list(func(*args, **kwargs))
File "/tmp/stable-diffusion-webui/modules/call_queue.py", line 28, in f
res = func(*args, **kwargs)
File "/tmp/stable-diffusion-webui/modules/txt2img.py", line 52, in txt2img
processed = process_images(p)
File "/tmp/stable-diffusion-webui/modules/processing.py", line 479, in process_images
res = process_images_inner(p)
File "/tmp/stable-diffusion-webui/modules/processing.py", line 597, in process_images_inner
uc = get_conds_with_caching(prompt_parser.get_learned_conditioning, negative_prompts, p.steps, cached_uc)
File "/tmp/stable-diffusion-webui/modules/processing.py", line 565, in get_conds_with_caching
cache[1] = function(shared.sd_model, required_prompts, steps)
File "/tmp/stable-diffusion-webui/modules/prompt_parser.py", line 138, in get_learned_conditioning
conds = model.get_learned_conditioning(texts)
File "/tmp/stable-diffusion-webui/repositories/stable-diffusion-stability-ai/ldm/models/diffusion/ddpm.py", line 669, in get_learned_conditioning
c = self.cond_stage_model(c)
File "/usr/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/tmp/stable-diffusion-webui/modules/sd_hijack_clip.py", line 220, in forward
z = self.process_tokens(tokens, multipliers)
File "/tmp/stable-diffusion-webui/extensions/stable-diffusion-webui-aesthetic-gradients/aesthetic_clip.py", line 202, in __call__
z = self.process_tokens(remade_batch_tokens, multipliers)
File "/tmp/stable-diffusion-webui/modules/sd_hijack_clip.py", line 245, in process_tokens
z = self.encode_with_transformers(tokens)
File "/tmp/stable-diffusion-webui/modules/sd_hijack_open_clip.py", line 28, in encode_with_transformers
z = self.wrapped.encode_with_transformer(tokens)
File "/tmp/stable-diffusion-webui/repositories/stable-diffusion-stability-ai/ldm/modules/encoders/modules.py", line 177, in encode_with_transformer
x = self.text_transformer_forward(x, attn_mask=self.model.attn_mask)
File "/tmp/stable-diffusion-webui/repositories/stable-diffusion-stability-ai/ldm/modules/encoders/modules.py", line 189, in text_transformer_forward
x = r(x, attn_mask=attn_mask)
File "/usr/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/tmp/stable-diffusion-webui/auto/lib/python3.10/site-packages/open_clip/transformer.py", line 193, in forward
x = x + self.ls_1(self.attention(self.ln_1(x), attn_mask=attn_mask))
File "/usr/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/tmp/stable-diffusion-webui/auto/lib/python3.10/site-packages/open_clip/transformer.py", line 27, in forward
x = F.layer_norm(x, self.normalized_shape, self.weight, self.bias, self.eps)
File "/usr/lib/python3.10/site-packages/torch/nn/functional.py", line 2515, in layer_norm
return torch.layer_norm(input, normalized_shape, weight, bias, eps, torch.backends.cudnn.enabled)
RuntimeError: expected scalar type Float but found Half
Only tested as working on MPS (macOS) so far from what I'm aware. It actually looks like --precision upcast may require a new enough version of PyTorch as I was testing on a pre-release build of PyTorch 2.0 and it works but it seems to fail with PyTorch 1.12.1.
@brkirch, ok, got it, thanks :) i am on pytorch 1.13.1 atm and throwing error for me
tried after your latest commit :)
- with only
--precision upcastit is working, but with SD-2.1 model giving black images as result - with
--precision upcast --upcastattnthrows error
Traceback
```python Traceback (most recent call last): File "/usr/lib/python3.10/site-packages/torch/utils/checkpoint.py", line 249, in checkpoint return CheckpointFunction.apply(function, preserve, *args) File "/usr/lib/python3.10/site-packages/torch/utils/checkpoint.py", line 107, in forward outputs = run_function(*args) File "/tmp/stable-diffusion-webui/repositories/stable-diffusion-stability-ai/ldm/modules/diffusionmodules/openaimodel.py", line 262, in _forward h = self.in_layers(x) File "/usr/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl return forward_call(*input, **kwargs) File "/usr/lib/python3.10/site-packages/torch/nn/modules/container.py", line 204, in forward input = module(input) File "/usr/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl return forward_call(*input, **kwargs) File "/usr/lib/python3.10/site-packages/torch/nn/modules/activation.py", line 395, in forward return F.silu(input, inplace=self.inplace) File "/usr/lib/python3.10/site-packages/torch/nn/functional.py", line 2059, in silu return torch._C._nn.silu(input) TypeError: silu(): argument 'input' (position 1) must be Tensor, not NoneTyp ```hope it helps a bit :)
Try again with the latest commit and see if it works now.
Also I just renamed the option from --upcastattn to --upcast-attn
I'm seeing issues also, at least with an older PyTorch version. I don't have the time to fix it right now, but I'll take a look at it later.
Still getting expected scalar type Half but found Float with all the latest changes on torch 1.13.1+rocm5.2, traceback is the same aside from line changes.
Traceback
Traceback (most recent call last):
File "/home/d/webui/modules/call_queue.py", line 45, in f
res = list(func(*args, **kwargs))
File "/home/d/webui/modules/call_queue.py", line 28, in f
res = func(*args, **kwargs)
File "/home/d/webui/modules/txt2img.py", line 52, in txt2img
processed = process_images(p)
File "/home/d/webui/modules/processing.py", line 479, in process_images
res = process_images_inner(p)
File "/home/d/webui/modules/processing.py", line 597, in process_images_inner
uc = get_conds_with_caching(prompt_parser.get_learned_conditioning, negative_prompts, p.steps, cached_uc)
File "/home/d/webui/modules/processing.py", line 565, in get_conds_with_caching
cache[1] = function(shared.sd_model, required_prompts, steps)
File "/home/d/webui/modules/prompt_parser.py", line 138, in get_learned_conditioning
conds = model.get_learned_conditioning(texts)
File "/home/d/webui/repositories/stable-diffusion-stability-ai/ldm/models/diffusion/ddpm.py", line 669, in get_learned_conditioning
c = self.cond_stage_model(c)
File "/home/d/webui/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/home/d/webui/modules/sd_hijack_clip.py", line 220, in forward
z = self.process_tokens(tokens, multipliers)
File "/home/d/webui/modules/sd_hijack_clip.py", line 245, in process_tokens
z = self.encode_with_transformers(tokens)
File "/home/d/webui/modules/sd_hijack_clip.py", line 293, in encode_with_transformers
outputs = self.wrapped.transformer(input_ids=tokens, output_hidden_states=-opts.CLIP_stop_at_last_layers)
File "/home/d/webui/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1212, in _call_impl
result = forward_call(*input, **kwargs)
File "/home/d/webui/venv/lib/python3.10/site-packages/transformers/models/clip/modeling_clip.py", line 722, in forward
return self.text_model(
File "/home/d/webui/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/home/d/webui/venv/lib/python3.10/site-packages/transformers/models/clip/modeling_clip.py", line 643, in forward
encoder_outputs = self.encoder(
File "/home/d/webui/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/home/d/webui/venv/lib/python3.10/site-packages/transformers/models/clip/modeling_clip.py", line 574, in forward
layer_outputs = encoder_layer(
File "/home/d/webui/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/home/d/webui/venv/lib/python3.10/site-packages/transformers/models/clip/modeling_clip.py", line 317, in forward
hidden_states, attn_weights = self.self_attn(
File "/home/d/webui/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/home/d/webui/venv/lib/python3.10/site-packages/transformers/models/clip/modeling_clip.py", line 257, in forward
attn_output = torch.bmm(attn_probs, value_states)
RuntimeError: expected scalar type Half but found Float
Model is fp16 safetensors, but I've seen the same on fp32 ckpt. No idea what else could cause this, I've tried turning VAE and CLIP skip off, nothing made difference.
hi, with this I can fix the issue related to torch 2.0.0 here? https://github.com/AUTOMATIC1111/stable-diffusion-webui/discussions/6455#discussioncomment-4624142
Getting black images or noise just using torch 2.0.0 and arguments
(If you can/want, check the whole post to understand a little bit the issue)
Thanks in advance...
I added some upcasting so that OpenCLIP should function correctly, so if SD 2.0 or 2.1 models weren't working with --precision upcast before those models will hopefully work now (just remember to specify --upcast-attn if using SD 2.1 models).
@Nacurutu It probably isn't directly related if you're already using --no-half. --upcast-attn is for using SD 2.1 models without --no-half, because if you already specify --no-half then --upcast-attn shouldn't actually do anything.
Nope, still expected scalar type Half but found Float. Model is SD 1.x-based, forgot to mention.
~~What version of PyTorch are you using?~~ Sorry, I missed that you mentioned it on your earlier post.
@ddvarpdd I attempted to address the issue, please let me know if the latest changes fix it for you or not.
Edit: Sorry there was a mistake in the fix, update again if you tried it before.
This time it's a bit different, now it expect Float but finds Half.
Traceback
Traceback (most recent call last):
File "/home/d/webui/modules/call_queue.py", line 45, in f
res = list(func(*args, **kwargs))
File "/home/d/webui/modules/call_queue.py", line 28, in f
res = func(*args, **kwargs)
File "/home/d/webui/modules/txt2img.py", line 52, in txt2img
processed = process_images(p)
File "/home/d/webui/modules/processing.py", line 479, in process_images
res = process_images_inner(p)
File "/home/d/webui/modules/processing.py", line 598, in process_images_inner
c = get_conds_with_caching(prompt_parser.get_multicond_learned_conditioning, prompts, p.steps, cached_c)
File "/home/d/webui/modules/processing.py", line 565, in get_conds_with_caching
cache[1] = function(shared.sd_model, required_prompts, steps)
File "/home/d/webui/modules/prompt_parser.py", line 203, in get_multicond_learned_conditioning
learned_conditioning = get_learned_conditioning(model, prompt_flat_list, steps)
File "/home/d/webui/modules/prompt_parser.py", line 138, in get_learned_conditioning
conds = model.get_learned_conditioning(texts)
File "/home/d/webui/repositories/stable-diffusion-stability-ai/ldm/models/diffusion/ddpm.py", line 669, in get_learned_conditioning
c = self.cond_stage_model(c)
File "/home/d/webui/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/home/d/webui/modules/sd_hijack_clip.py", line 221, in forward
z = self.process_tokens(tokens, multipliers)
File "/home/d/webui/modules/sd_hijack_clip.py", line 246, in process_tokens
z = self.encode_with_transformers(tokens)
File "/home/d/webui/modules/sd_hijack_clip.py", line 294, in encode_with_transformers
outputs = self.wrapped.transformer(input_ids=tokens, output_hidden_states=-opts.CLIP_stop_at_last_layers)
File "/home/d/webui/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1212, in _call_impl
result = forward_call(*input, **kwargs)
File "/home/d/webui/venv/lib/python3.10/site-packages/transformers/models/clip/modeling_clip.py", line 722, in forward
return self.text_model(
File "/home/d/webui/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/home/d/webui/venv/lib/python3.10/site-packages/transformers/models/clip/modeling_clip.py", line 643, in forward
encoder_outputs = self.encoder(
File "/home/d/webui/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/home/d/webui/venv/lib/python3.10/site-packages/transformers/models/clip/modeling_clip.py", line 574, in forward
layer_outputs = encoder_layer(
File "/home/d/webui/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/home/d/webui/venv/lib/python3.10/site-packages/transformers/models/clip/modeling_clip.py", line 316, in forward
hidden_states = self.layer_norm1(hidden_states)
File "/home/d/webui/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/home/d/webui/venv/lib/python3.10/site-packages/torch/nn/modules/normalization.py", line 190, in forward
return F.layer_norm(
File "/home/d/webui/venv/lib/python3.10/site-packages/torch/nn/functional.py", line 2515, in layer_norm
return torch.layer_norm(input, normalized_shape, weight, bias, eps, torch.backends.cudnn.enabled)
RuntimeError: expected scalar type Float but found Half
@fractal-fumbler I know you mentioned using ROCm and PyTorch 1.13.1, but did you install with the wheels package or was it built from source? It seems likely there are some major differences between your build of PyTorch and the build @ddvarpdd is using.
If that's important, I've installed it from official wheel on https://download.pytorch.org/whl.
build from sources
@fractal-fumbler I know you mentioned using ROCm and PyTorch 1.13.1, but did you install with the wheels package or was it built from source? It seems likely there are some major differences between your build of PyTorch and the build @ddvarpdd is using.
used PGKBUILD form https://github.com/rocm-arch/python-pytorch-rocm to compile by standart archlinux's tool: makepkg
should i stop filing tracebacks with errors?
p.s. reason of compiling from source - officially unsupported architecture gfx1031 p.p.s. haven't tested latest release yet
@ClashSAN Sorry to bother you with this, but would you be able to test this PR on CUDA? More specifically, if you could try --precision upcast (without --no-half) and see if you get any errors when generating images it would be much appreciated.
@ddvarpdd Unfortunately I'm going to have to put support for --precision upcast for the official PyTorch ROCm wheel on hold for the time being. If PyTorch 2.0 doesn't work correctly then I can take another look at it, but the official ROCm build of 1.13.1 just seems to be too tempermental when working with different dtypes. In the meantime if you want to get it working then you can try building a newer PyTorch from source. Sorry I don't have better news in that regard.
should i stop filing tracebacks with errors?
@fractal-fumbler Please continue to post tracebacks when you have them. Actually the tracebacks are the only way I can have much of any idea what is going on, so try to avoid posting error messages without tracebacks if you have a traceback. Did you encounter more errors or is --upcast-attn working for you now?
Thank your for your time. I will be sure to test it out when PyTorch 2.0 support is a bit more finalized both in webui and ROCm.
@fractal-fumbler Please continue to post tracebacks when you have them. Actually the tracebacks are the only way I can have much of any idea what is going on, so try to avoid posting error messages without tracebacks if you have a traceback. Did you encounter more errors or is
--upcast-attnworking for you now?
with latest commit 9b12dca043abb04d2b6c732b3530dfd443229549 i still encounter error RuntimeError: expected scalar type Half but found Float
Traceback
Traceback (most recent call last):
File "/tmp/stable-diffusion-webui/modules/call_queue.py", line 45, in f
res = list(func(*args, **kwargs))
File "/tmp/stable-diffusion-webui/modules/call_queue.py", line 28, in f
res = func(*args, **kwargs)
File "/tmp/stable-diffusion-webui/modules/txt2img.py", line 54, in txt2img
processed = process_images(p)
File "/tmp/stable-diffusion-webui/modules/processing.py", line 488, in process_images
res = process_images_inner(p)
File "/tmp/stable-diffusion-webui/modules/processing.py", line 617, in process_images_inner
samples_ddim = p.sample(conditioning=c, unconditional_conditioning=uc, seeds=seeds, subseeds=subseeds, subseed_strength=p.subseed_strength, prompts=prompts)
File "/tmp/stable-diffusion-webui/modules/processing.py", line 806, in sample
samples = self.sampler.sample(self, x, conditioning, unconditional_conditioning, image_conditioning=self.txt2img_image_conditioning(x), mimic_scale=self.mimic_scale, threshold_enable=self.threshold_enable)
File "/tmp/stable-diffusion-webui/modules/sd_samplers.py", line 574, in sample
samples = self.launch_sampling(steps, lambda: self.func(self.model_wrap_cfg, x, extra_args={
File "/tmp/stable-diffusion-webui/modules/sd_samplers.py", line 475, in launch_sampling
return func()
File "/tmp/stable-diffusion-webui/modules/sd_samplers.py", line 574, in <lambda>
samples = self.launch_sampling(steps, lambda: self.func(self.model_wrap_cfg, x, extra_args={
File "/usr/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "/tmp/stable-diffusion-webui/repositories/k-diffusion/k_diffusion/sampling.py", line 145, in sample_euler_ancestral
denoised = model(x, sigmas[i] * s_in, **extra_args)
File "/usr/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/tmp/stable-diffusion-webui/modules/sd_samplers.py", line 373, in forward
x_out = self.inner_model(x_in, sigma_in, cond={"c_crossattn": [cond_in], "c_concat": [image_cond_in]})
File "/usr/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/tmp/stable-diffusion-webui/repositories/k-diffusion/k_diffusion/external.py", line 167, in forward
return self.get_v(input * c_in, self.sigma_to_t(sigma), **kwargs) * c_out + input * c_skip
File "/tmp/stable-diffusion-webui/repositories/k-diffusion/k_diffusion/external.py", line 177, in get_v
return self.inner_model.apply_model(x, t, cond)
File "/tmp/stable-diffusion-webui/modules/sd_hijack_unet.py", line 46, in apply_model
return orig_apply_model(self, x_noisy.to(devices.dtype_unet), t.to(devices.dtype_unet), cond, **kwargs).to(devices.dtype)
File "/tmp/stable-diffusion-webui/repositories/stable-diffusion-stability-ai/ldm/models/diffusion/ddpm.py", line 858, in apply_model
x_recon = self.model(x_noisy, t, **cond)
File "/usr/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1212, in _call_impl
result = forward_call(*input, **kwargs)
File "/tmp/stable-diffusion-webui/repositories/stable-diffusion-stability-ai/ldm/models/diffusion/ddpm.py", line 1329, in forward
out = self.diffusion_model(x, t, context=cc)
File "/usr/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/tmp/stable-diffusion-webui/repositories/stable-diffusion-stability-ai/ldm/modules/diffusionmodules/openaimodel.py", line 776, in forward
h = module(h, emb, context)
File "/usr/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/tmp/stable-diffusion-webui/repositories/stable-diffusion-stability-ai/ldm/modules/diffusionmodules/openaimodel.py", line 84, in forward
x = layer(x, context)
File "/usr/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/tmp/stable-diffusion-webui/repositories/stable-diffusion-stability-ai/ldm/modules/attention.py", line 334, in forward
x = block(x, context=context[i])
File "/usr/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/tmp/stable-diffusion-webui/modules/sd_hijack_checkpoint.py", line 4, in BasicTransformerBlock_forward
return checkpoint(self._forward, x, context)
File "/usr/lib/python3.10/site-packages/torch/utils/checkpoint.py", line 249, in checkpoint
return CheckpointFunction.apply(function, preserve, *args)
File "/usr/lib/python3.10/site-packages/torch/utils/checkpoint.py", line 107, in forward
outputs = run_function(*args)
File "/tmp/stable-diffusion-webui/repositories/stable-diffusion-stability-ai/ldm/modules/attention.py", line 272, in _forward
x = self.attn1(self.norm1(x), context=context if self.disable_self_attn else None) + x
File "/usr/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/tmp/stable-diffusion-webui/modules/sd_hijack_optimizations.py", line 260, in sub_quad_attention_forward
x = sub_quad_attention(q, k, v, q_chunk_size=shared.cmd_opts.sub_quad_q_chunk_size, kv_chunk_size=shared.cmd_opts.sub_quad_kv_chunk_size, chunk_threshold=shared.cmd_opts.sub_quad_chunk_threshold, use_checkpoint=self.training)
File "/tmp/stable-diffusion-webui/modules/sd_hijack_optimizations.py", line 296, in sub_quad_attention
return efficient_dot_product_attention(
File "/tmp/stable-diffusion-webui/modules/sub_quadratic_attention.py", line 207, in efficient_dot_product_attention
res = torch.cat([
File "/tmp/stable-diffusion-webui/modules/sub_quadratic_attention.py", line 208, in <listcomp>
compute_query_chunk_attn(
File "/tmp/stable-diffusion-webui/modules/sub_quadratic_attention.py", line 132, in _get_attention_scores_no_kv_chunking
hidden_states_slice = torch.bmm(attn_probs, value)
RuntimeError: expected scalar type Half but found Float
python version output
python -c "import torch; print(torch.__version__); print(torch.randn(1).cuda())"
1.13.1
tensor([0.4277], device='cuda:0')
and thanks a lot for your for your time really appreciate it :)
update1 - launch command
cmd
PYTORCH_CUDA_ALLOC_CONF=garbage_collection_threshold:0.6,roundup_power2_divisions:4,max_split_size_mb:128 PYTHONPATH=/tmp/stable-diffusion-webui python launch.py --theme=dark --opt-split-attention --opt-channelslast --always-batch-cond-uncond --medvram --opt-sub-quad-attention --precision upcast --upcast-attn 2>&1 | tee /tmp/log
Tested it with PyTorch 2.0 on ROCm.
Using the one built against ROCm 5.3 (2.0.0.dev20230111+rocm5.3) results in webui simply hanging and doing nothing after clicking generate. This happens even without this patch, so it's likely some upstream issue, telling that just in case someone else stubles upon this issue.
ROCm 5.2-based Torch (2.0.0.dev20230111+rocm5.2) works fine without patch, but adding it and enabling --precision upcast once again results in RuntimeError: expected scalar type Half but found Float.
Traceback
Traceback (most recent call last):
File "/home/d/webui/modules/call_queue.py", line 45, in f
res = list(func(*args, **kwargs))
File "/home/d/webui/modules/call_queue.py", line 28, in f
res = func(*args, **kwargs)
File "/home/d/webui/modules/txt2img.py", line 52, in txt2img
processed = process_images(p)
File "/home/d/webui/modules/processing.py", line 479, in process_images
res = process_images_inner(p)
File "/home/d/webui/modules/processing.py", line 597, in process_images_inner
uc = get_conds_with_caching(prompt_parser.get_learned_conditioning, negative_prompts, p.steps, cached_uc)
File "/home/d/webui/modules/processing.py", line 565, in get_conds_with_caching
cache[1] = function(shared.sd_model, required_prompts, steps)
File "/home/d/webui/modules/prompt_parser.py", line 140, in get_learned_conditioning
conds = model.get_learned_conditioning(texts)
File "/home/d/webui/repositories/stable-diffusion-stability-ai/ldm/models/diffusion/ddpm.py", line 669, in get_learned_conditioning
c = self.cond_stage_model(c)
File "/home/d/webui/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl
return forward_call(*args, **kwargs)
File "/home/d/webui/modules/sd_hijack_clip.py", line 220, in forward
z = self.process_tokens(tokens, multipliers)
File "/home/d/webui/modules/sd_hijack_clip.py", line 245, in process_tokens
z = self.encode_with_transformers(tokens)
File "/home/d/webui/modules/sd_hijack_clip.py", line 293, in encode_with_transformers
outputs = self.wrapped.transformer(input_ids=tokens, output_hidden_states=-opts.CLIP_stop_at_last_layers)
File "/home/d/webui/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1519, in _call_impl
result = forward_call(*args, **kwargs)
File "/home/d/webui/venv/lib/python3.10/site-packages/transformers/models/clip/modeling_clip.py", line 722, in forward
return self.text_model(
File "/home/d/webui/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl
return forward_call(*args, **kwargs)
File "/home/d/webui/venv/lib/python3.10/site-packages/transformers/models/clip/modeling_clip.py", line 643, in forward
encoder_outputs = self.encoder(
File "/home/d/webui/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl
return forward_call(*args, **kwargs)
File "/home/d/webui/venv/lib/python3.10/site-packages/transformers/models/clip/modeling_clip.py", line 574, in forward
layer_outputs = encoder_layer(
File "/home/d/webui/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl
return forward_call(*args, **kwargs)
File "/home/d/webui/venv/lib/python3.10/site-packages/transformers/models/clip/modeling_clip.py", line 317, in forward
hidden_states, attn_weights = self.self_attn(
File "/home/d/webui/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl
return forward_call(*args, **kwargs)
File "/home/d/webui/venv/lib/python3.10/site-packages/transformers/models/clip/modeling_clip.py", line 257, in forward
attn_output = torch.bmm(attn_probs, value_states)
RuntimeError: expected scalar type Half but found Float
No luck when reverting 9b12dca, either.
Traceback
Traceback (most recent call last):
File "/home/d/webui/modules/call_queue.py", line 45, in f
res = list(func(*args, **kwargs))
File "/home/d/webui/modules/call_queue.py", line 28, in f
res = func(*args, **kwargs)
File "/home/d/webui/modules/txt2img.py", line 52, in txt2img
processed = process_images(p)
File "/home/d/webui/modules/processing.py", line 479, in process_images
res = process_images_inner(p)
File "/home/d/webui/modules/processing.py", line 598, in process_images_inner
c = get_conds_with_caching(prompt_parser.get_multicond_learned_conditioning, prompts, p.steps, cached_c)
File "/home/d/webui/modules/processing.py", line 565, in get_conds_with_caching
cache[1] = function(shared.sd_model, required_prompts, steps)
File "/home/d/webui/modules/prompt_parser.py", line 205, in get_multicond_learned_conditioning
learned_conditioning = get_learned_conditioning(model, prompt_flat_list, steps)
File "/home/d/webui/modules/prompt_parser.py", line 140, in get_learned_conditioning
conds = model.get_learned_conditioning(texts)
File "/home/d/webui/repositories/stable-diffusion-stability-ai/ldm/models/diffusion/ddpm.py", line 669, in get_learned_conditioning
c = self.cond_stage_model(c)
File "/home/d/webui/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl
return forward_call(*args, **kwargs)
File "/home/d/webui/modules/sd_hijack_clip.py", line 221, in forward
z = self.process_tokens(tokens, multipliers)
File "/home/d/webui/modules/sd_hijack_clip.py", line 246, in process_tokens
z = self.encode_with_transformers(tokens)
File "/home/d/webui/modules/sd_hijack_clip.py", line 294, in encode_with_transformers
outputs = self.wrapped.transformer(input_ids=tokens, output_hidden_states=-opts.CLIP_stop_at_last_layers)
File "/home/d/webui/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1519, in _call_impl
result = forward_call(*args, **kwargs)
File "/home/d/webui/venv/lib/python3.10/site-packages/transformers/models/clip/modeling_clip.py", line 722, in forward
return self.text_model(
File "/home/d/webui/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl
return forward_call(*args, **kwargs)
File "/home/d/webui/venv/lib/python3.10/site-packages/transformers/models/clip/modeling_clip.py", line 643, in forward
encoder_outputs = self.encoder(
File "/home/d/webui/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl
return forward_call(*args, **kwargs)
File "/home/d/webui/venv/lib/python3.10/site-packages/transformers/models/clip/modeling_clip.py", line 574, in forward
layer_outputs = encoder_layer(
File "/home/d/webui/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl
return forward_call(*args, **kwargs)
File "/home/d/webui/venv/lib/python3.10/site-packages/transformers/models/clip/modeling_clip.py", line 316, in forward
hidden_states = self.layer_norm1(hidden_states)
File "/home/d/webui/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl
return forward_call(*args, **kwargs)
File "/home/d/webui/venv/lib/python3.10/site-packages/torch/nn/modules/normalization.py", line 190, in forward
return F.layer_norm(
File "/home/d/webui/venv/lib/python3.10/site-packages/torch/nn/functional.py", line 2515, in layer_norm
return torch.layer_norm(input, normalized_shape, weight, bias, eps, torch.backends.cudnn.enabled)
RuntimeError: expected scalar type Float but found Half
I've taken another attempt at fixing the ROCm issues.
@fractal-fumbler Try again to see if it is working for you now, and if it is please also try some larger image sizes to see if you encounter errors.
@ddvarpdd You can try again with PyTorch 1.13.1, I'm going to take another try at getting it working. Just let me know if you encounter any more errors.
neatto!
i was able to generate picture with txt2img with this command
Command
PYTORCH_CUDA_ALLOC_CONF=garbage_collection_threshold:0.6,roundup_power2_divisions:4,max_split_size_mb:128 PYTHONPATH=/tmp/stable-diffusion-webui python launch.py --theme=dark --opt-split-attention --opt-channelslast --always-batch-cond-uncond --medvram --opt-sub-quad-attention --precision upcast --upcast-attn
VRAM consumption for 768x768 picture reduced from 6.7 gb to 5.2 gb
I've taken another attempt at fixing the ROCm issues.
@fractal-fumbler Try again to see if it is working for you now, and if it is please also try some larger image sizes to see if you encounter errors.
Tho new issue rised: error RuntimeError: expected scalar type Float but found Half, when there is textual inversion embedding in prompt
used: test prompt with classipeint as embedding
Traceback
Arguments: ('test prompt with classipeint as embedding', '', 'None', 'None', 20, 9, False, False, 1, 1, 7, 7.5, False, 2282081976.0, -1.0, 0, 0, 0, False, 768, 768, False, 0.7, 2, 'Latent', 0, 0, 0, 0, None, '', 'outputs', False, False, False, False, '', 1, '', 0, '', True, False, False) {}
Traceback (most recent call last):
File "/tmp/stable-diffusion-webui/modules/call_queue.py", line 45, in f
res = list(func(*args, **kwargs))
File "/tmp/stable-diffusion-webui/modules/call_queue.py", line 28, in f
res = func(*args, **kwargs)
File "/tmp/stable-diffusion-webui/modules/txt2img.py", line 54, in txt2img
processed = process_images(p)
File "/tmp/stable-diffusion-webui/modules/processing.py", line 488, in process_images
res = process_images_inner(p)
File "/tmp/stable-diffusion-webui/modules/processing.py", line 607, in process_images_inner
c = get_conds_with_caching(prompt_parser.get_multicond_learned_conditioning, prompts, p.steps, cached_c)
File "/tmp/stable-diffusion-webui/modules/processing.py", line 574, in get_conds_with_caching
cache[1] = function(shared.sd_model, required_prompts, steps)
File "/tmp/stable-diffusion-webui/modules/prompt_parser.py", line 205, in get_multicond_learned_conditioning
learned_conditioning = get_learned_conditioning(model, prompt_flat_list, steps)
File "/tmp/stable-diffusion-webui/modules/prompt_parser.py", line 140, in get_learned_conditioning
conds = model.get_learned_conditioning(texts)
File "/tmp/stable-diffusion-webui/repositories/stable-diffusion-stability-ai/ldm/models/diffusion/ddpm.py", line 669, in get_learned_conditioning
c = self.cond_stage_model(c)
File "/usr/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/tmp/stable-diffusion-webui/modules/sd_hijack_clip.py", line 222, in forward
z = self.process_tokens(tokens, multipliers)
File "/tmp/stable-diffusion-webui/modules/sd_hijack_clip.py", line 247, in process_tokens
z = self.encode_with_transformers(tokens)
File "/tmp/stable-diffusion-webui/modules/sd_hijack_open_clip.py", line 30, in encode_with_transformers
z = self.wrapped.encode_with_transformer(tokens)
File "/tmp/stable-diffusion-webui/repositories/stable-diffusion-stability-ai/ldm/modules/encoders/modules.py", line 177, in encode_with_transformer
x = self.text_transformer_forward(x, attn_mask=self.model.attn_mask)
File "/tmp/stable-diffusion-webui/repositories/stable-diffusion-stability-ai/ldm/modules/encoders/modules.py", line 189, in text_transformer_forward
x = r(x, attn_mask=attn_mask)
File "/usr/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/tmp/stable-diffusion-webui/auto/lib/python3.10/site-packages/open_clip/transformer.py", line 193, in forward
x = x + self.ls_1(self.attention(self.ln_1(x), attn_mask=attn_mask))
File "/usr/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/tmp/stable-diffusion-webui/auto/lib/python3.10/site-packages/open_clip/transformer.py", line 27, in forward
x = F.layer_norm(x, self.normalized_shape, self.weight, self.bias, self.eps)
File "/usr/lib/python3.10/site-packages/torch/nn/functional.py", line 2515, in layer_norm
return torch.layer_norm(input, normalized_shape, weight, bias, eps, torch.backends.cudnn.enabled)
RuntimeError: expected scalar type Float but found Half
downloaded embedding from https://huggingface.co/EldritchAdam/classipeint
Tested both on 1.13.1 and the same 2.0.0 nightly, no embeddings used, similar error but with much larger traceback on both.
1.13.1
Traceback (most recent call last):
File "/home/d/webui/modules/call_queue.py", line 45, in f
res = list(func(*args, **kwargs))
File "/home/d/webui/modules/call_queue.py", line 28, in f
res = func(*args, **kwargs)
File "/home/d/webui/modules/txt2img.py", line 52, in txt2img
processed = process_images(p)
File "/home/d/webui/modules/processing.py", line 479, in process_images
res = process_images_inner(p)
File "/home/d/webui/modules/processing.py", line 608, in process_images_inner
samples_ddim = p.sample(conditioning=c, unconditional_conditioning=uc, seeds=seeds, subseeds=subseeds, subseed_strength=p.subseed_strength, prompts=prompts)
File "/home/d/webui/modules/processing.py", line 797, in sample
samples = self.sampler.sample(self, x, conditioning, unconditional_conditioning, image_conditioning=self.txt2img_image_conditioning(x))
File "/home/d/webui/modules/sd_samplers.py", line 537, in sample
samples = self.launch_sampling(steps, lambda: self.func(self.model_wrap_cfg, x, extra_args={
File "/home/d/webui/modules/sd_samplers.py", line 440, in launch_sampling
return func()
File "/home/d/webui/modules/sd_samplers.py", line 537, in <lambda>
samples = self.launch_sampling(steps, lambda: self.func(self.model_wrap_cfg, x, extra_args={
File "/home/d/webui/venv/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "/home/d/webui/repositories/k-diffusion/k_diffusion/sampling.py", line 594, in sample_dpmpp_2m
denoised = model(x, sigmas[i] * s_in, **extra_args)
File "/home/d/webui/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/home/d/webui/modules/sd_samplers.py", line 351, in forward
x_out[a:b] = self.inner_model(x_in[a:b], sigma_in[a:b], cond={"c_crossattn": [tensor[a:b]], "c_concat": [image_cond_in[a:b]]})
File "/home/d/webui/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/home/d/webui/repositories/k-diffusion/k_diffusion/external.py", line 112, in forward
eps = self.get_eps(input * c_in, self.sigma_to_t(sigma), **kwargs)
File "/home/d/webui/repositories/k-diffusion/k_diffusion/external.py", line 138, in get_eps
return self.inner_model.apply_model(*args, **kwargs)
File "/home/d/webui/modules/sd_hijack_unet.py", line 46, in apply_model
return orig_apply_model(self, x_noisy.to(devices.dtype_unet), t.to(devices.dtype_unet), cond, **kwargs).to(devices.dtype)
File "/home/d/webui/repositories/stable-diffusion-stability-ai/ldm/models/diffusion/ddpm.py", line 858, in apply_model
x_recon = self.model(x_noisy, t, **cond)
File "/home/d/webui/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1212, in _call_impl
result = forward_call(*input, **kwargs)
File "/home/d/webui/repositories/stable-diffusion-stability-ai/ldm/models/diffusion/ddpm.py", line 1329, in forward
out = self.diffusion_model(x, t, context=cc)
File "/home/d/webui/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/home/d/webui/repositories/stable-diffusion-stability-ai/ldm/modules/diffusionmodules/openaimodel.py", line 776, in forward
h = module(h, emb, context)
File "/home/d/webui/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/home/d/webui/repositories/stable-diffusion-stability-ai/ldm/modules/diffusionmodules/openaimodel.py", line 82, in forward
x = layer(x, emb)
File "/home/d/webui/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/home/d/webui/modules/sd_hijack_checkpoint.py", line 10, in ResBlock_forward
return checkpoint(self._forward, x, emb)
File "/home/d/webui/venv/lib/python3.10/site-packages/torch/utils/checkpoint.py", line 249, in checkpoint
return CheckpointFunction.apply(function, preserve, *args)
File "/home/d/webui/venv/lib/python3.10/site-packages/torch/utils/checkpoint.py", line 107, in forward
outputs = run_function(*args)
File "/home/d/webui/repositories/stable-diffusion-stability-ai/ldm/modules/diffusionmodules/openaimodel.py", line 262, in _forward
h = self.in_layers(x)
File "/home/d/webui/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/home/d/webui/venv/lib/python3.10/site-packages/torch/nn/modules/container.py", line 204, in forward
input = module(input)
File "/home/d/webui/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/home/d/webui/repositories/stable-diffusion-stability-ai/ldm/modules/diffusionmodules/util.py", line 219, in forward
return super().forward(x.float()).type(x.dtype)
File "/home/d/webui/venv/lib/python3.10/site-packages/torch/nn/modules/normalization.py", line 273, in forward
return F.group_norm(
File "/home/d/webui/venv/lib/python3.10/site-packages/torch/nn/functional.py", line 2528, in group_norm
return torch.group_norm(input, num_groups, weight, bias, eps, torch.backends.cudnn.enabled)
RuntimeError: expected scalar type Float but found Half
2.0.0.dev20230111
Traceback (most recent call last):
File "/home/d/webui/modules/call_queue.py", line 45, in f
res = list(func(*args, **kwargs))
File "/home/d/webui/modules/call_queue.py", line 28, in f
res = func(*args, **kwargs)
File "/home/d/webui/modules/txt2img.py", line 52, in txt2img
processed = process_images(p)
File "/home/d/webui/modules/processing.py", line 479, in process_images
res = process_images_inner(p)
File "/home/d/webui/modules/processing.py", line 608, in process_images_inner
samples_ddim = p.sample(conditioning=c, unconditional_conditioning=uc, seeds=seeds, subseeds=subseeds, subseed_strength=p.subseed_strength, prompts=prompts)
File "/home/d/webui/modules/processing.py", line 797, in sample
samples = self.sampler.sample(self, x, conditioning, unconditional_conditioning, image_conditioning=self.txt2img_image_conditioning(x))
File "/home/d/webui/modules/sd_samplers.py", line 537, in sample
samples = self.launch_sampling(steps, lambda: self.func(self.model_wrap_cfg, x, extra_args={
File "/home/d/webui/modules/sd_samplers.py", line 440, in launch_sampling
return func()
File "/home/d/webui/modules/sd_samplers.py", line 537, in <lambda>
samples = self.launch_sampling(steps, lambda: self.func(self.model_wrap_cfg, x, extra_args={
File "/home/d/webui/venv/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/home/d/webui/repositories/k-diffusion/k_diffusion/sampling.py", line 594, in sample_dpmpp_2m
denoised = model(x, sigmas[i] * s_in, **extra_args)
File "/home/d/webui/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl
return forward_call(*args, **kwargs)
File "/home/d/webui/modules/sd_samplers.py", line 351, in forward
x_out[a:b] = self.inner_model(x_in[a:b], sigma_in[a:b], cond={"c_crossattn": [tensor[a:b]], "c_concat": [image_cond_in[a:b]]})
File "/home/d/webui/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl
return forward_call(*args, **kwargs)
File "/home/d/webui/repositories/k-diffusion/k_diffusion/external.py", line 112, in forward
eps = self.get_eps(input * c_in, self.sigma_to_t(sigma), **kwargs)
File "/home/d/webui/repositories/k-diffusion/k_diffusion/external.py", line 138, in get_eps
return self.inner_model.apply_model(*args, **kwargs)
File "/home/d/webui/modules/sd_hijack_unet.py", line 46, in apply_model
return orig_apply_model(self, x_noisy.to(devices.dtype_unet), t.to(devices.dtype_unet), cond, **kwargs).to(devices.dtype)
File "/home/d/webui/repositories/stable-diffusion-stability-ai/ldm/models/diffusion/ddpm.py", line 858, in apply_model
x_recon = self.model(x_noisy, t, **cond)
File "/home/d/webui/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1519, in _call_impl
result = forward_call(*args, **kwargs)
File "/home/d/webui/repositories/stable-diffusion-stability-ai/ldm/models/diffusion/ddpm.py", line 1329, in forward
out = self.diffusion_model(x, t, context=cc)
File "/home/d/webui/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl
return forward_call(*args, **kwargs)
File "/home/d/webui/repositories/stable-diffusion-stability-ai/ldm/modules/diffusionmodules/openaimodel.py", line 776, in forward
h = module(h, emb, context)
File "/home/d/webui/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl
return forward_call(*args, **kwargs)
File "/home/d/webui/repositories/stable-diffusion-stability-ai/ldm/modules/diffusionmodules/openaimodel.py", line 82, in forward
x = layer(x, emb)
File "/home/d/webui/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl
return forward_call(*args, **kwargs)
File "/home/d/webui/modules/sd_hijack_checkpoint.py", line 10, in ResBlock_forward
return checkpoint(self._forward, x, emb)
File "/home/d/webui/venv/lib/python3.10/site-packages/torch/utils/checkpoint.py", line 249, in checkpoint
return CheckpointFunction.apply(function, preserve, *args)
File "/home/d/webui/venv/lib/python3.10/site-packages/torch/autograd/function.py", line 453, in apply
return super().apply(*args, **kwargs)
File "/home/d/webui/venv/lib/python3.10/site-packages/torch/utils/checkpoint.py", line 107, in forward
outputs = run_function(*args)
File "/home/d/webui/repositories/stable-diffusion-stability-ai/ldm/modules/diffusionmodules/openaimodel.py", line 262, in _forward
h = self.in_layers(x)
File "/home/d/webui/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl
return forward_call(*args, **kwargs)
File "/home/d/webui/venv/lib/python3.10/site-packages/torch/nn/modules/container.py", line 217, in forward
input = module(input)
File "/home/d/webui/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl
return forward_call(*args, **kwargs)
File "/home/d/webui/repositories/stable-diffusion-stability-ai/ldm/modules/diffusionmodules/util.py", line 219, in forward
return super().forward(x.float()).type(x.dtype)
File "/home/d/webui/venv/lib/python3.10/site-packages/torch/nn/modules/normalization.py", line 273, in forward
return F.group_norm(
File "/home/d/webui/venv/lib/python3.10/site-packages/torch/nn/functional.py", line 2530, in group_norm
return torch.group_norm(input, num_groups, weight, bias, eps, torch.backends.cudnn.enabled)
RuntimeError: mixed dtype (CPU): expect parameter to have scalar type of Float
testing further no-half vs upcast :)
examples with --no-half --precision full --no-half-vae
no-half

examples with --precision upcast --upcast-attn
upcast

@fractal-fumbler @ddvarpdd I've renamed --precision upcast to --upcast_sampling Please try using --upcast_sampling without --precision. This should reenable autocast, which will hopefully fix any remaining issues (or this may break everything - I have no way to be sure unless someone with ROCm tests it). That said, even if it seems to work, if @fractal-fumbler has any images from before this change that can be regenerated then it would be a good idea to check and make sure regenerating results in an identical image.
Please try using
--upcast_samplingwithout--precision
@brkirch, latest patch (96093475731c2f95fba3911ee66e5065deb21005) with
-
--opt-split-attention --opt-channelslast --always-batch-cond-uncond --medvram --opt-sub-quad-attention
--upcast-sampling --upcast-attngives black images -
--opt-split-attention --opt-channelslast --always-batch-cond-uncond --medvram --opt-sub-quad-attention
--precision upcast --upcast-att --precision full
gives error RuntimeError: expected scalar type Half but found Float
Traceback
Traceback (most recent call last):
File "/tmp/stable-diffusion-20/modules/call_queue.py", line 45, in f
res = list(func(*args, **kwargs))
File "/tmp/stable-diffusion-20/modules/call_queue.py", line 28, in f
res = func(*args, **kwargs)
File "/tmp/stable-diffusion-20/modules/txt2img.py", line 54, in txt2img
processed = process_images(p)
File "/tmp/stable-diffusion-20/modules/processing.py", line 488, in process_images
res = process_images_inner(p)
File "/tmp/stable-diffusion-20/modules/processing.py", line 617, in process_images_inner
samples_ddim = p.sample(conditioning=c, unconditional_conditioning=uc, seeds=seeds, subseeds=subseeds, subseed_strength=p.subseed_strength, prompts=prompts)
File "/tmp/stable-diffusion-20/modules/processing.py", line 806, in sample
samples = self.sampler.sample(self, x, conditioning, unconditional_conditioning, image_conditioning=self.txt2img_image_conditioning(x), mimic_scale=self.mimic_scale, threshold_enable=self.threshold_enable)
File "/tmp/stable-diffusion-20/modules/sd_samplers.py", line 574, in sample
samples = self.launch_sampling(steps, lambda: self.func(self.model_wrap_cfg, x, extra_args={
File "/tmp/stable-diffusion-20/modules/sd_samplers.py", line 475, in launch_sampling
return func()
File "/tmp/stable-diffusion-20/modules/sd_samplers.py", line 574, in <lambda>
samples = self.launch_sampling(steps, lambda: self.func(self.model_wrap_cfg, x, extra_args={
File "/usr/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "/tmp/stable-diffusion-20/repositories/k-diffusion/k_diffusion/sampling.py", line 145, in sample_euler_ancestral
denoised = model(x, sigmas[i] * s_in, **extra_args)
File "/usr/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/tmp/stable-diffusion-20/modules/sd_samplers.py", line 373, in forward
x_out = self.inner_model(x_in, sigma_in, cond={"c_crossattn": [cond_in], "c_concat": [image_cond_in]})
File "/usr/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/tmp/stable-diffusion-20/repositories/k-diffusion/k_diffusion/external.py", line 167, in forward
return self.get_v(input * c_in, self.sigma_to_t(sigma), **kwargs) * c_out + input * c_skip
File "/tmp/stable-diffusion-20/repositories/k-diffusion/k_diffusion/external.py", line 177, in get_v
return self.inner_model.apply_model(x, t, cond)
File "/tmp/stable-diffusion-20/modules/sd_hijack_unet.py", line 46, in apply_model
return orig_apply_model(self, x_noisy.to(devices.dtype_unet), t.to(devices.dtype_unet), cond, **kwargs).to(torch.float32)
File "/tmp/stable-diffusion-20/repositories/stable-diffusion-stability-ai/ldm/models/diffusion/ddpm.py", line 858, in apply_model
x_recon = self.model(x_noisy, t, **cond)
File "/usr/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1212, in _call_impl
result = forward_call(*input, **kwargs)
File "/tmp/stable-diffusion-20/repositories/stable-diffusion-stability-ai/ldm/models/diffusion/ddpm.py", line 1329, in forward
out = self.diffusion_model(x, t, context=cc)
File "/usr/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/tmp/stable-diffusion-20/repositories/stable-diffusion-stability-ai/ldm/modules/diffusionmodules/openaimodel.py", line 776, in forward
h = module(h, emb, context)
File "/usr/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/tmp/stable-diffusion-20/repositories/stable-diffusion-stability-ai/ldm/modules/diffusionmodules/openaimodel.py", line 84, in forward
x = layer(x, context)
File "/usr/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/tmp/stable-diffusion-20/repositories/stable-diffusion-stability-ai/ldm/modules/attention.py", line 334, in forward
x = block(x, context=context[i])
File "/usr/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/tmp/stable-diffusion-20/modules/sd_hijack_checkpoint.py", line 4, in BasicTransformerBlock_forward
return checkpoint(self._forward, x, context)
File "/usr/lib/python3.10/site-packages/torch/utils/checkpoint.py", line 249, in checkpoint
return CheckpointFunction.apply(function, preserve, *args)
File "/usr/lib/python3.10/site-packages/torch/utils/checkpoint.py", line 107, in forward
outputs = run_function(*args)
File "/tmp/stable-diffusion-20/repositories/stable-diffusion-stability-ai/ldm/modules/attention.py", line 272, in _forward
x = self.attn1(self.norm1(x), context=context if self.disable_self_attn else None) + x
File "/usr/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/tmp/stable-diffusion-20/modules/sd_hijack_optimizations.py", line 260, in sub_quad_attention_forward
x = sub_quad_attention(q, k, v, q_chunk_size=shared.cmd_opts.sub_quad_q_chunk_size, kv_chunk_size=shared.cmd_opts.sub_quad_kv_chunk_size, chunk_threshold=shared.cmd_opts.sub_quad_chunk_threshold, use_checkpoint=self.training)
File "/tmp/stable-diffusion-20/modules/sd_hijack_optimizations.py", line 296, in sub_quad_attention
return efficient_dot_product_attention(
File "/tmp/stable-diffusion-20/modules/sub_quadratic_attention.py", line 207, in efficient_dot_product_attention
res = torch.cat([
File "/tmp/stable-diffusion-20/modules/sub_quadratic_attention.py", line 208, in <listcomp>
compute_query_chunk_attn(
File "/tmp/stable-diffusion-20/modules/sub_quadratic_attention.py", line 132, in _get_attention_scores_no_kv_chunking
hidden_states_slice = torch.bmm(attn_probs, value)
RuntimeError: expected scalar type Half but found Float
- --opt-split-attention --opt-channelslast --always-batch-cond-uncond --medvram --opt-sub-quad-attention
--upcast-samplingblack images too
usually black images on SD-2.1 are generated when --no-half aka fp32 isn't used in generation proccess
Disappointing, but although I did think there was a decent chance it wouldn't work it was definitely worth a try. Thank you for testing, I'll revert those changes for now and continue working on a fix for the embeddings error and the error @ddvarpdd is getting.
hadn't checked, but FYI, 96093475731c2f95fba3911ee66e5065deb21005 and 280083c9c30058d092c6b6f6aadac5e669b322fc didn't throw error with embedding being used. tho giving black images
@ddvarpdd is your model for SD-1.5 or SD-2.0 or SD-2.1?
so, @brkirch, 96093475731c2f95fba3911ee66e5065deb21005 working on SD-1.5 and SD-2.0 (for me on pytorch 1.13.1), including embeddings :)
also tested on 280083c9c30058d092c6b6f6aadac5e669b322fc - works
getting increased speed with Euler a from 1.3 it/s to 1.6 it/s on 768x768 picture generation
p.s. sorry, that didn't tested on SD-2.0 or SD-1.5
Sounds like getting autocast turned back on did the trick after all! Thank you both for testing!
@ddvarpdd Testing on PyTorch 1.13.1 would probably be a good idea, as 2.0 is probably the cause of the black squares. Also if you could try commit 96093475731c2f95fba3911ee66e5065deb21005 with a SD 1.5 model and see if it works then I’d much appreciate it.
Accidentally bumped the “Close with comment” button (I’m on a phone right now).
so amd (rocm) problem with SD-2.1 is that there is no xformers available and amd users can't make SD-2.1 to force to use fp16 with, since it is only working with xformers installed.
https://github.com/Stability-AI/stablediffusion
Per default, the attention operation of the model is evaluated at full precision when xformers is not installed. To enable fp16 (which can cause numerical instabilities with the vanilla attention module on the v2.1 model) , run your script with ATTN_PRECISION=fp16 python
that's why generation on SD-2.1 for amd users is possible with --no-half
@fractal-fumbler Cross attention layer optimizations actually override that by monkey patching CrossAttention.forward() I suspect the issue you were seeing is due to autocast evaluating entirely at float16 precision rather than evaluating at float32 precision and downcasting. It’s not hard to fix fortunately.
@ddvarpdd The most recent change will default a few more things to float16 and so it could be causing a few issues. I haven’t had the chance to test the upscalers just yet. It does however sound like it is probably being caused by running out of memory. If you haven’t already, try running without the new options and see if you get the same issue. If you do, then it is probably not directly related to this PR.
after pulling latest commits with 96093475731c2f95fba3911ee66e5065deb21005 i am getting vram is around 4.7-4.8 GB with generating 768x768 and with 1024x768 (hires fix) upd: w/o medvram it's like 5.0-5.1 GB VRAM usage
using SD-2.0 model tho
comparison of speed (it/s|s/it): i didn't noticed any big difference with brkirch patch it's like 0.2-0.5 seconds more for speed.
@fractal-fumbler See if the latest change prevents black images with --upcast-attn.