stable-diffusion-webui icon indicating copy to clipboard operation
stable-diffusion-webui copied to clipboard

Add upcast options, full precision sampling from float16 UNet and upcasting attention for inference using SD 2.1 models without --no-half

Open brkirch opened this issue 2 years ago • 41 comments

Detailed list of changes:

  • Add --upcast_sampling (uses a monkey patch to apply_model to downcast/upcast between the UNet and a monkey patch to timestep_embedding to fix the dtype)
  • Add --upcast-attn option so that inference works with SD 2.1 models with a float16 UNet (adds upcasting of q and k to torch.float32 for each cross attention layer optimization)
  • Add support for using --upcast_sampling with older PyTorch versions (this accounts for the other 4 of the 6 monkey patches)
  • Add credit to README.md

Thanks to marunine for the idea of sampling in full precision from a float16 UNet and Birch-san for the example implementation for Diffusers.

Also see: https://twitter.com/Birchlabs/status/1599903883278663681

This allows for using a float16 UNet with sampling in float32. Increases speed and decreases memory usage on some hardware that doesn't otherwise work without --no-half.

To use the changes in this PR, run with --precision upcast and without --no-half.

Edit: Added --upcast-attn option, with this option it should be possible to generate images using a SD 2.1 model without the --no-half option.

This PR is tested and working on macOS (PyTorch MPS device). If you try this on ROCm or CUDA, please report if it is working for you or not. If it is not working for you, please also report your PyTorch version and Traceback or error message (if applicable).

brkirch avatar Jan 08 '23 09:01 brkirch

Doesn't seem to work right now, tested on RX 5500 with ROCm.

Traceback (most recent call last):
  File "/home/d/webui/modules/call_queue.py", line 45, in f
    res = list(func(*args, **kwargs))
  File "/home/d/webui/modules/call_queue.py", line 28, in f
    res = func(*args, **kwargs)
  File "/home/d/webui/modules/txt2img.py", line 52, in txt2img
    processed = process_images(p)
  File "/home/d/webui/modules/processing.py", line 479, in process_images
    res = process_images_inner(p)
  File "/home/d/webui/modules/processing.py", line 597, in process_images_inner
    uc = get_conds_with_caching(prompt_parser.get_learned_conditioning, negative_prompts, p.steps, cached_uc)
  File "/home/d/webui/modules/processing.py", line 565, in get_conds_with_caching
    cache[1] = function(shared.sd_model, required_prompts, steps)
  File "/home/d/webui/modules/prompt_parser.py", line 138, in get_learned_conditioning
    conds = model.get_learned_conditioning(texts)
  File "/home/d/webui/repositories/stable-diffusion-stability-ai/ldm/models/diffusion/ddpm.py", line 669, in get_learned_conditioning
    c = self.cond_stage_model(c)
  File "/home/d/webui/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/d/webui/modules/sd_hijack_clip.py", line 220, in forward
    z = self.process_tokens(tokens, multipliers)
  File "/home/d/webui/modules/sd_hijack_clip.py", line 245, in process_tokens
    z = self.encode_with_transformers(tokens)
  File "/home/d/webui/modules/sd_hijack_clip.py", line 293, in encode_with_transformers
    outputs = self.wrapped.transformer(input_ids=tokens, output_hidden_states=-opts.CLIP_stop_at_last_layers)
  File "/home/d/webui/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1148, in _call_impl
    result = forward_call(*input, **kwargs)
  File "/home/d/webui/venv/lib/python3.10/site-packages/transformers/models/clip/modeling_clip.py", line 722, in forward
    return self.text_model(
  File "/home/d/webui/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/d/webui/venv/lib/python3.10/site-packages/transformers/models/clip/modeling_clip.py", line 643, in forward
    encoder_outputs = self.encoder(
  File "/home/d/webui/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/d/webui/venv/lib/python3.10/site-packages/transformers/models/clip/modeling_clip.py", line 574, in forward
    layer_outputs = encoder_layer(
  File "/home/d/webui/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/d/webui/venv/lib/python3.10/site-packages/transformers/models/clip/modeling_clip.py", line 317, in forward
    hidden_states, attn_weights = self.self_attn(
  File "/home/d/webui/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/d/webui/venv/lib/python3.10/site-packages/transformers/models/clip/modeling_clip.py", line 257, in forward
    attn_output = torch.bmm(attn_probs, value_states)
RuntimeError: expected scalar type Half but found Float

COMMANDLINE_ARGS are --precision upcast --medvram --opt-split-attention-v1 --always-batch-cond-uncond --deepdanbooru.

ddvarpdd avatar Jan 08 '23 10:01 ddvarpdd

Also on rocm. should it work with rocm?

with new option --upcastattn i am getting error (full command --opt-split-attention --opt-channelslast --medvram --precision upcast --upcastattn --opt-sub-quad-attention)

Traceback
Traceback (most recent call last):
  File "/tmp/stable-diffusion-webui/modules/call_queue.py", line 45, in f
    res = list(func(*args, **kwargs))
  File "/tmp/stable-diffusion-webui/modules/call_queue.py", line 28, in f
    res = func(*args, **kwargs)
  File "/tmp/stable-diffusion-webui/modules/txt2img.py", line 52, in txt2img
    processed = process_images(p)
  File "/tmp/stable-diffusion-webui/modules/processing.py", line 479, in process_images
    res = process_images_inner(p)
  File "/tmp/stable-diffusion-webui/modules/processing.py", line 597, in process_images_inner
    uc = get_conds_with_caching(prompt_parser.get_learned_conditioning, negative_prompts, p.steps, cached_uc)
  File "/tmp/stable-diffusion-webui/modules/processing.py", line 565, in get_conds_with_caching
    cache[1] = function(shared.sd_model, required_prompts, steps)
  File "/tmp/stable-diffusion-webui/modules/prompt_parser.py", line 138, in get_learned_conditioning
    conds = model.get_learned_conditioning(texts)
  File "/tmp/stable-diffusion-webui/repositories/stable-diffusion-stability-ai/ldm/models/diffusion/ddpm.py", line 669, in get_learned_conditioning
    c = self.cond_stage_model(c)
  File "/usr/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/tmp/stable-diffusion-webui/modules/sd_hijack_clip.py", line 220, in forward
    z = self.process_tokens(tokens, multipliers)
  File "/tmp/stable-diffusion-webui/extensions/stable-diffusion-webui-aesthetic-gradients/aesthetic_clip.py", line 202, in __call__
    z = self.process_tokens(remade_batch_tokens, multipliers)
  File "/tmp/stable-diffusion-webui/modules/sd_hijack_clip.py", line 245, in process_tokens
    z = self.encode_with_transformers(tokens)
  File "/tmp/stable-diffusion-webui/modules/sd_hijack_open_clip.py", line 28, in encode_with_transformers
    z = self.wrapped.encode_with_transformer(tokens)
  File "/tmp/stable-diffusion-webui/repositories/stable-diffusion-stability-ai/ldm/modules/encoders/modules.py", line 177, in encode_with_transformer
    x = self.text_transformer_forward(x, attn_mask=self.model.attn_mask)
  File "/tmp/stable-diffusion-webui/repositories/stable-diffusion-stability-ai/ldm/modules/encoders/modules.py", line 189, in text_transformer_forward
    x = r(x, attn_mask=attn_mask)
  File "/usr/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/tmp/stable-diffusion-webui/auto/lib/python3.10/site-packages/open_clip/transformer.py", line 193, in forward
    x = x + self.ls_1(self.attention(self.ln_1(x), attn_mask=attn_mask))
  File "/usr/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/tmp/stable-diffusion-webui/auto/lib/python3.10/site-packages/open_clip/transformer.py", line 27, in forward
    x = F.layer_norm(x, self.normalized_shape, self.weight, self.bias, self.eps)
  File "/usr/lib/python3.10/site-packages/torch/nn/functional.py", line 2515, in layer_norm
    return torch.layer_norm(input, normalized_shape, weight, bias, eps, torch.backends.cudnn.enabled)
RuntimeError: expected scalar type Float but found Half

fractal-fumbler avatar Jan 08 '23 11:01 fractal-fumbler

Only tested as working on MPS (macOS) so far from what I'm aware. It actually looks like --precision upcast may require a new enough version of PyTorch as I was testing on a pre-release build of PyTorch 2.0 and it works but it seems to fail with PyTorch 1.12.1.

brkirch avatar Jan 08 '23 11:01 brkirch

@brkirch, ok, got it, thanks :) i am on pytorch 1.13.1 atm and throwing error for me

fractal-fumbler avatar Jan 08 '23 11:01 fractal-fumbler

tried after your latest commit :)

  1. with only --precision upcast it is working, but with SD-2.1 model giving black images as result
  2. with --precision upcast --upcastattn throws error
Traceback ```python Traceback (most recent call last): File "/usr/lib/python3.10/site-packages/torch/utils/checkpoint.py", line 249, in checkpoint return CheckpointFunction.apply(function, preserve, *args) File "/usr/lib/python3.10/site-packages/torch/utils/checkpoint.py", line 107, in forward outputs = run_function(*args) File "/tmp/stable-diffusion-webui/repositories/stable-diffusion-stability-ai/ldm/modules/diffusionmodules/openaimodel.py", line 262, in _forward h = self.in_layers(x) File "/usr/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl return forward_call(*input, **kwargs) File "/usr/lib/python3.10/site-packages/torch/nn/modules/container.py", line 204, in forward input = module(input) File "/usr/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl return forward_call(*input, **kwargs) File "/usr/lib/python3.10/site-packages/torch/nn/modules/activation.py", line 395, in forward return F.silu(input, inplace=self.inplace) File "/usr/lib/python3.10/site-packages/torch/nn/functional.py", line 2059, in silu return torch._C._nn.silu(input) TypeError: silu(): argument 'input' (position 1) must be Tensor, not NoneTyp ```

hope it helps a bit :)

fractal-fumbler avatar Jan 08 '23 14:01 fractal-fumbler

Try again with the latest commit and see if it works now.

brkirch avatar Jan 08 '23 15:01 brkirch

Also I just renamed the option from --upcastattn to --upcast-attn

brkirch avatar Jan 08 '23 15:01 brkirch

I'm seeing issues also, at least with an older PyTorch version. I don't have the time to fix it right now, but I'll take a look at it later.

brkirch avatar Jan 08 '23 15:01 brkirch

Still getting expected scalar type Half but found Float with all the latest changes on torch 1.13.1+rocm5.2, traceback is the same aside from line changes.

Traceback

Traceback (most recent call last):
  File "/home/d/webui/modules/call_queue.py", line 45, in f
    res = list(func(*args, **kwargs))
  File "/home/d/webui/modules/call_queue.py", line 28, in f
    res = func(*args, **kwargs)
  File "/home/d/webui/modules/txt2img.py", line 52, in txt2img
    processed = process_images(p)
  File "/home/d/webui/modules/processing.py", line 479, in process_images
    res = process_images_inner(p)
  File "/home/d/webui/modules/processing.py", line 597, in process_images_inner
    uc = get_conds_with_caching(prompt_parser.get_learned_conditioning, negative_prompts, p.steps, cached_uc)
  File "/home/d/webui/modules/processing.py", line 565, in get_conds_with_caching
    cache[1] = function(shared.sd_model, required_prompts, steps)
  File "/home/d/webui/modules/prompt_parser.py", line 138, in get_learned_conditioning
    conds = model.get_learned_conditioning(texts)
  File "/home/d/webui/repositories/stable-diffusion-stability-ai/ldm/models/diffusion/ddpm.py", line 669, in get_learned_conditioning
    c = self.cond_stage_model(c)
  File "/home/d/webui/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/d/webui/modules/sd_hijack_clip.py", line 220, in forward
    z = self.process_tokens(tokens, multipliers)
  File "/home/d/webui/modules/sd_hijack_clip.py", line 245, in process_tokens
    z = self.encode_with_transformers(tokens)
  File "/home/d/webui/modules/sd_hijack_clip.py", line 293, in encode_with_transformers
    outputs = self.wrapped.transformer(input_ids=tokens, output_hidden_states=-opts.CLIP_stop_at_last_layers)
  File "/home/d/webui/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1212, in _call_impl
    result = forward_call(*input, **kwargs)
  File "/home/d/webui/venv/lib/python3.10/site-packages/transformers/models/clip/modeling_clip.py", line 722, in forward
    return self.text_model(
  File "/home/d/webui/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/d/webui/venv/lib/python3.10/site-packages/transformers/models/clip/modeling_clip.py", line 643, in forward
    encoder_outputs = self.encoder(
  File "/home/d/webui/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/d/webui/venv/lib/python3.10/site-packages/transformers/models/clip/modeling_clip.py", line 574, in forward
    layer_outputs = encoder_layer(
  File "/home/d/webui/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/d/webui/venv/lib/python3.10/site-packages/transformers/models/clip/modeling_clip.py", line 317, in forward
    hidden_states, attn_weights = self.self_attn(
  File "/home/d/webui/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/d/webui/venv/lib/python3.10/site-packages/transformers/models/clip/modeling_clip.py", line 257, in forward
    attn_output = torch.bmm(attn_probs, value_states)
RuntimeError: expected scalar type Half but found Float

Model is fp16 safetensors, but I've seen the same on fp32 ckpt. No idea what else could cause this, I've tried turning VAE and CLIP skip off, nothing made difference.

ddvarpdd avatar Jan 08 '23 18:01 ddvarpdd

hi, with this I can fix the issue related to torch 2.0.0 here? https://github.com/AUTOMATIC1111/stable-diffusion-webui/discussions/6455#discussioncomment-4624142

Getting black images or noise just using torch 2.0.0 and arguments

(If you can/want, check the whole post to understand a little bit the issue)

Thanks in advance...

Nacurutu avatar Jan 08 '23 18:01 Nacurutu

I added some upcasting so that OpenCLIP should function correctly, so if SD 2.0 or 2.1 models weren't working with --precision upcast before those models will hopefully work now (just remember to specify --upcast-attn if using SD 2.1 models).

@Nacurutu It probably isn't directly related if you're already using --no-half. --upcast-attn is for using SD 2.1 models without --no-half, because if you already specify --no-half then --upcast-attn shouldn't actually do anything.

brkirch avatar Jan 09 '23 01:01 brkirch

Nope, still expected scalar type Half but found Float. Model is SD 1.x-based, forgot to mention.

ddvarpdd avatar Jan 09 '23 14:01 ddvarpdd

~~What version of PyTorch are you using?~~ Sorry, I missed that you mentioned it on your earlier post.

brkirch avatar Jan 09 '23 15:01 brkirch

@ddvarpdd I attempted to address the issue, please let me know if the latest changes fix it for you or not.

Edit: Sorry there was a mistake in the fix, update again if you tried it before.

brkirch avatar Jan 10 '23 03:01 brkirch

This time it's a bit different, now it expect Float but finds Half.

Traceback

Traceback (most recent call last):
  File "/home/d/webui/modules/call_queue.py", line 45, in f
    res = list(func(*args, **kwargs))
  File "/home/d/webui/modules/call_queue.py", line 28, in f
    res = func(*args, **kwargs)
  File "/home/d/webui/modules/txt2img.py", line 52, in txt2img
    processed = process_images(p)
  File "/home/d/webui/modules/processing.py", line 479, in process_images
    res = process_images_inner(p)
  File "/home/d/webui/modules/processing.py", line 598, in process_images_inner
    c = get_conds_with_caching(prompt_parser.get_multicond_learned_conditioning, prompts, p.steps, cached_c)
  File "/home/d/webui/modules/processing.py", line 565, in get_conds_with_caching
    cache[1] = function(shared.sd_model, required_prompts, steps)
  File "/home/d/webui/modules/prompt_parser.py", line 203, in get_multicond_learned_conditioning
    learned_conditioning = get_learned_conditioning(model, prompt_flat_list, steps)
  File "/home/d/webui/modules/prompt_parser.py", line 138, in get_learned_conditioning
    conds = model.get_learned_conditioning(texts)
  File "/home/d/webui/repositories/stable-diffusion-stability-ai/ldm/models/diffusion/ddpm.py", line 669, in get_learned_conditioning
    c = self.cond_stage_model(c)
  File "/home/d/webui/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/d/webui/modules/sd_hijack_clip.py", line 221, in forward
    z = self.process_tokens(tokens, multipliers)
  File "/home/d/webui/modules/sd_hijack_clip.py", line 246, in process_tokens
    z = self.encode_with_transformers(tokens)
  File "/home/d/webui/modules/sd_hijack_clip.py", line 294, in encode_with_transformers
    outputs = self.wrapped.transformer(input_ids=tokens, output_hidden_states=-opts.CLIP_stop_at_last_layers)
  File "/home/d/webui/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1212, in _call_impl
    result = forward_call(*input, **kwargs)
  File "/home/d/webui/venv/lib/python3.10/site-packages/transformers/models/clip/modeling_clip.py", line 722, in forward
    return self.text_model(
  File "/home/d/webui/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/d/webui/venv/lib/python3.10/site-packages/transformers/models/clip/modeling_clip.py", line 643, in forward
    encoder_outputs = self.encoder(
  File "/home/d/webui/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/d/webui/venv/lib/python3.10/site-packages/transformers/models/clip/modeling_clip.py", line 574, in forward
    layer_outputs = encoder_layer(
  File "/home/d/webui/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/d/webui/venv/lib/python3.10/site-packages/transformers/models/clip/modeling_clip.py", line 316, in forward
    hidden_states = self.layer_norm1(hidden_states)
  File "/home/d/webui/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/d/webui/venv/lib/python3.10/site-packages/torch/nn/modules/normalization.py", line 190, in forward
    return F.layer_norm(
  File "/home/d/webui/venv/lib/python3.10/site-packages/torch/nn/functional.py", line 2515, in layer_norm
    return torch.layer_norm(input, normalized_shape, weight, bias, eps, torch.backends.cudnn.enabled)
RuntimeError: expected scalar type Float but found Half

ddvarpdd avatar Jan 10 '23 09:01 ddvarpdd

@fractal-fumbler I know you mentioned using ROCm and PyTorch 1.13.1, but did you install with the wheels package or was it built from source? It seems likely there are some major differences between your build of PyTorch and the build @ddvarpdd is using.

brkirch avatar Jan 10 '23 10:01 brkirch

If that's important, I've installed it from official wheel on https://download.pytorch.org/whl.

ddvarpdd avatar Jan 10 '23 11:01 ddvarpdd

build from sources

@fractal-fumbler I know you mentioned using ROCm and PyTorch 1.13.1, but did you install with the wheels package or was it built from source? It seems likely there are some major differences between your build of PyTorch and the build @ddvarpdd is using.

used PGKBUILD form https://github.com/rocm-arch/python-pytorch-rocm to compile by standart archlinux's tool: makepkg

should i stop filing tracebacks with errors?

p.s. reason of compiling from source - officially unsupported architecture gfx1031 p.p.s. haven't tested latest release yet

fractal-fumbler avatar Jan 10 '23 12:01 fractal-fumbler

@ClashSAN Sorry to bother you with this, but would you be able to test this PR on CUDA? More specifically, if you could try --precision upcast (without --no-half) and see if you get any errors when generating images it would be much appreciated.

@ddvarpdd Unfortunately I'm going to have to put support for --precision upcast for the official PyTorch ROCm wheel on hold for the time being. If PyTorch 2.0 doesn't work correctly then I can take another look at it, but the official ROCm build of 1.13.1 just seems to be too tempermental when working with different dtypes. In the meantime if you want to get it working then you can try building a newer PyTorch from source. Sorry I don't have better news in that regard.

should i stop filing tracebacks with errors?

@fractal-fumbler Please continue to post tracebacks when you have them. Actually the tracebacks are the only way I can have much of any idea what is going on, so try to avoid posting error messages without tracebacks if you have a traceback. Did you encounter more errors or is --upcast-attn working for you now?

brkirch avatar Jan 10 '23 13:01 brkirch

Thank your for your time. I will be sure to test it out when PyTorch 2.0 support is a bit more finalized both in webui and ROCm.

ddvarpdd avatar Jan 10 '23 13:01 ddvarpdd

@fractal-fumbler Please continue to post tracebacks when you have them. Actually the tracebacks are the only way I can have much of any idea what is going on, so try to avoid posting error messages without tracebacks if you have a traceback. Did you encounter more errors or is --upcast-attn working for you now?

with latest commit 9b12dca043abb04d2b6c732b3530dfd443229549 i still encounter error RuntimeError: expected scalar type Half but found Float

Traceback
Traceback (most recent call last):
  File "/tmp/stable-diffusion-webui/modules/call_queue.py", line 45, in f
    res = list(func(*args, **kwargs))
  File "/tmp/stable-diffusion-webui/modules/call_queue.py", line 28, in f
    res = func(*args, **kwargs)
  File "/tmp/stable-diffusion-webui/modules/txt2img.py", line 54, in txt2img
    processed = process_images(p)
  File "/tmp/stable-diffusion-webui/modules/processing.py", line 488, in process_images
    res = process_images_inner(p)
  File "/tmp/stable-diffusion-webui/modules/processing.py", line 617, in process_images_inner
    samples_ddim = p.sample(conditioning=c, unconditional_conditioning=uc, seeds=seeds, subseeds=subseeds, subseed_strength=p.subseed_strength, prompts=prompts)
  File "/tmp/stable-diffusion-webui/modules/processing.py", line 806, in sample
    samples = self.sampler.sample(self, x, conditioning, unconditional_conditioning, image_conditioning=self.txt2img_image_conditioning(x), mimic_scale=self.mimic_scale, threshold_enable=self.threshold_enable)
  File "/tmp/stable-diffusion-webui/modules/sd_samplers.py", line 574, in sample
    samples = self.launch_sampling(steps, lambda: self.func(self.model_wrap_cfg, x, extra_args={
  File "/tmp/stable-diffusion-webui/modules/sd_samplers.py", line 475, in launch_sampling
    return func()
  File "/tmp/stable-diffusion-webui/modules/sd_samplers.py", line 574, in <lambda>
    samples = self.launch_sampling(steps, lambda: self.func(self.model_wrap_cfg, x, extra_args={
  File "/usr/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "/tmp/stable-diffusion-webui/repositories/k-diffusion/k_diffusion/sampling.py", line 145, in sample_euler_ancestral
    denoised = model(x, sigmas[i] * s_in, **extra_args)
  File "/usr/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/tmp/stable-diffusion-webui/modules/sd_samplers.py", line 373, in forward
    x_out = self.inner_model(x_in, sigma_in, cond={"c_crossattn": [cond_in], "c_concat": [image_cond_in]})
  File "/usr/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/tmp/stable-diffusion-webui/repositories/k-diffusion/k_diffusion/external.py", line 167, in forward
    return self.get_v(input * c_in, self.sigma_to_t(sigma), **kwargs) * c_out + input * c_skip
  File "/tmp/stable-diffusion-webui/repositories/k-diffusion/k_diffusion/external.py", line 177, in get_v
    return self.inner_model.apply_model(x, t, cond)
  File "/tmp/stable-diffusion-webui/modules/sd_hijack_unet.py", line 46, in apply_model
    return orig_apply_model(self, x_noisy.to(devices.dtype_unet), t.to(devices.dtype_unet), cond, **kwargs).to(devices.dtype)
  File "/tmp/stable-diffusion-webui/repositories/stable-diffusion-stability-ai/ldm/models/diffusion/ddpm.py", line 858, in apply_model
    x_recon = self.model(x_noisy, t, **cond)
  File "/usr/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1212, in _call_impl
    result = forward_call(*input, **kwargs)
  File "/tmp/stable-diffusion-webui/repositories/stable-diffusion-stability-ai/ldm/models/diffusion/ddpm.py", line 1329, in forward
    out = self.diffusion_model(x, t, context=cc)
  File "/usr/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/tmp/stable-diffusion-webui/repositories/stable-diffusion-stability-ai/ldm/modules/diffusionmodules/openaimodel.py", line 776, in forward
    h = module(h, emb, context)
  File "/usr/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/tmp/stable-diffusion-webui/repositories/stable-diffusion-stability-ai/ldm/modules/diffusionmodules/openaimodel.py", line 84, in forward
    x = layer(x, context)
  File "/usr/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/tmp/stable-diffusion-webui/repositories/stable-diffusion-stability-ai/ldm/modules/attention.py", line 334, in forward
    x = block(x, context=context[i])
  File "/usr/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/tmp/stable-diffusion-webui/modules/sd_hijack_checkpoint.py", line 4, in BasicTransformerBlock_forward
    return checkpoint(self._forward, x, context)
  File "/usr/lib/python3.10/site-packages/torch/utils/checkpoint.py", line 249, in checkpoint
    return CheckpointFunction.apply(function, preserve, *args)
  File "/usr/lib/python3.10/site-packages/torch/utils/checkpoint.py", line 107, in forward
    outputs = run_function(*args)
  File "/tmp/stable-diffusion-webui/repositories/stable-diffusion-stability-ai/ldm/modules/attention.py", line 272, in _forward
    x = self.attn1(self.norm1(x), context=context if self.disable_self_attn else None) + x
  File "/usr/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/tmp/stable-diffusion-webui/modules/sd_hijack_optimizations.py", line 260, in sub_quad_attention_forward
    x = sub_quad_attention(q, k, v, q_chunk_size=shared.cmd_opts.sub_quad_q_chunk_size, kv_chunk_size=shared.cmd_opts.sub_quad_kv_chunk_size, chunk_threshold=shared.cmd_opts.sub_quad_chunk_threshold, use_checkpoint=self.training)
  File "/tmp/stable-diffusion-webui/modules/sd_hijack_optimizations.py", line 296, in sub_quad_attention
    return efficient_dot_product_attention(
  File "/tmp/stable-diffusion-webui/modules/sub_quadratic_attention.py", line 207, in efficient_dot_product_attention
    res = torch.cat([
  File "/tmp/stable-diffusion-webui/modules/sub_quadratic_attention.py", line 208, in <listcomp>
    compute_query_chunk_attn(
  File "/tmp/stable-diffusion-webui/modules/sub_quadratic_attention.py", line 132, in _get_attention_scores_no_kv_chunking
    hidden_states_slice = torch.bmm(attn_probs, value)
RuntimeError: expected scalar type Half but found Float
python version output
 python -c "import torch; print(torch.__version__); print(torch.randn(1).cuda())"
1.13.1
tensor([0.4277], device='cuda:0')

and thanks a lot for your for your time really appreciate it :)

update1 - launch command

cmd
PYTORCH_CUDA_ALLOC_CONF=garbage_collection_threshold:0.6,roundup_power2_divisions:4,max_split_size_mb:128 PYTHONPATH=/tmp/stable-diffusion-webui python launch.py --theme=dark --opt-split-attention --opt-channelslast --always-batch-cond-uncond --medvram --opt-sub-quad-attention --precision upcast --upcast-attn 2>&1 | tee /tmp/log

fractal-fumbler avatar Jan 10 '23 17:01 fractal-fumbler

Tested it with PyTorch 2.0 on ROCm. Using the one built against ROCm 5.3 (2.0.0.dev20230111+rocm5.3) results in webui simply hanging and doing nothing after clicking generate. This happens even without this patch, so it's likely some upstream issue, telling that just in case someone else stubles upon this issue. ROCm 5.2-based Torch (2.0.0.dev20230111+rocm5.2) works fine without patch, but adding it and enabling --precision upcast once again results in RuntimeError: expected scalar type Half but found Float.

Traceback

Traceback (most recent call last):
  File "/home/d/webui/modules/call_queue.py", line 45, in f
    res = list(func(*args, **kwargs))
  File "/home/d/webui/modules/call_queue.py", line 28, in f
    res = func(*args, **kwargs)
  File "/home/d/webui/modules/txt2img.py", line 52, in txt2img
    processed = process_images(p)
  File "/home/d/webui/modules/processing.py", line 479, in process_images
    res = process_images_inner(p)
  File "/home/d/webui/modules/processing.py", line 597, in process_images_inner
    uc = get_conds_with_caching(prompt_parser.get_learned_conditioning, negative_prompts, p.steps, cached_uc)
  File "/home/d/webui/modules/processing.py", line 565, in get_conds_with_caching
    cache[1] = function(shared.sd_model, required_prompts, steps)
  File "/home/d/webui/modules/prompt_parser.py", line 140, in get_learned_conditioning
    conds = model.get_learned_conditioning(texts)
  File "/home/d/webui/repositories/stable-diffusion-stability-ai/ldm/models/diffusion/ddpm.py", line 669, in get_learned_conditioning
    c = self.cond_stage_model(c)
  File "/home/d/webui/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/d/webui/modules/sd_hijack_clip.py", line 220, in forward
    z = self.process_tokens(tokens, multipliers)
  File "/home/d/webui/modules/sd_hijack_clip.py", line 245, in process_tokens
    z = self.encode_with_transformers(tokens)
  File "/home/d/webui/modules/sd_hijack_clip.py", line 293, in encode_with_transformers
    outputs = self.wrapped.transformer(input_ids=tokens, output_hidden_states=-opts.CLIP_stop_at_last_layers)
  File "/home/d/webui/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1519, in _call_impl
    result = forward_call(*args, **kwargs)
  File "/home/d/webui/venv/lib/python3.10/site-packages/transformers/models/clip/modeling_clip.py", line 722, in forward
    return self.text_model(
  File "/home/d/webui/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/d/webui/venv/lib/python3.10/site-packages/transformers/models/clip/modeling_clip.py", line 643, in forward
    encoder_outputs = self.encoder(
  File "/home/d/webui/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/d/webui/venv/lib/python3.10/site-packages/transformers/models/clip/modeling_clip.py", line 574, in forward
    layer_outputs = encoder_layer(
  File "/home/d/webui/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/d/webui/venv/lib/python3.10/site-packages/transformers/models/clip/modeling_clip.py", line 317, in forward
    hidden_states, attn_weights = self.self_attn(
  File "/home/d/webui/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/d/webui/venv/lib/python3.10/site-packages/transformers/models/clip/modeling_clip.py", line 257, in forward
    attn_output = torch.bmm(attn_probs, value_states)
RuntimeError: expected scalar type Half but found Float

No luck when reverting 9b12dca, either.

Traceback

Traceback (most recent call last):
  File "/home/d/webui/modules/call_queue.py", line 45, in f
    res = list(func(*args, **kwargs))
  File "/home/d/webui/modules/call_queue.py", line 28, in f
    res = func(*args, **kwargs)
  File "/home/d/webui/modules/txt2img.py", line 52, in txt2img
    processed = process_images(p)
  File "/home/d/webui/modules/processing.py", line 479, in process_images
    res = process_images_inner(p)
  File "/home/d/webui/modules/processing.py", line 598, in process_images_inner
    c = get_conds_with_caching(prompt_parser.get_multicond_learned_conditioning, prompts, p.steps, cached_c)
  File "/home/d/webui/modules/processing.py", line 565, in get_conds_with_caching
    cache[1] = function(shared.sd_model, required_prompts, steps)
  File "/home/d/webui/modules/prompt_parser.py", line 205, in get_multicond_learned_conditioning
    learned_conditioning = get_learned_conditioning(model, prompt_flat_list, steps)
  File "/home/d/webui/modules/prompt_parser.py", line 140, in get_learned_conditioning
    conds = model.get_learned_conditioning(texts)
  File "/home/d/webui/repositories/stable-diffusion-stability-ai/ldm/models/diffusion/ddpm.py", line 669, in get_learned_conditioning
    c = self.cond_stage_model(c)
  File "/home/d/webui/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/d/webui/modules/sd_hijack_clip.py", line 221, in forward
    z = self.process_tokens(tokens, multipliers)
  File "/home/d/webui/modules/sd_hijack_clip.py", line 246, in process_tokens
    z = self.encode_with_transformers(tokens)
  File "/home/d/webui/modules/sd_hijack_clip.py", line 294, in encode_with_transformers
    outputs = self.wrapped.transformer(input_ids=tokens, output_hidden_states=-opts.CLIP_stop_at_last_layers)
  File "/home/d/webui/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1519, in _call_impl
    result = forward_call(*args, **kwargs)
  File "/home/d/webui/venv/lib/python3.10/site-packages/transformers/models/clip/modeling_clip.py", line 722, in forward
    return self.text_model(
  File "/home/d/webui/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/d/webui/venv/lib/python3.10/site-packages/transformers/models/clip/modeling_clip.py", line 643, in forward
    encoder_outputs = self.encoder(
  File "/home/d/webui/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/d/webui/venv/lib/python3.10/site-packages/transformers/models/clip/modeling_clip.py", line 574, in forward
    layer_outputs = encoder_layer(
  File "/home/d/webui/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/d/webui/venv/lib/python3.10/site-packages/transformers/models/clip/modeling_clip.py", line 316, in forward
    hidden_states = self.layer_norm1(hidden_states)
  File "/home/d/webui/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/d/webui/venv/lib/python3.10/site-packages/torch/nn/modules/normalization.py", line 190, in forward
    return F.layer_norm(
  File "/home/d/webui/venv/lib/python3.10/site-packages/torch/nn/functional.py", line 2515, in layer_norm
    return torch.layer_norm(input, normalized_shape, weight, bias, eps, torch.backends.cudnn.enabled)
RuntimeError: expected scalar type Float but found Half

ddvarpdd avatar Jan 11 '23 16:01 ddvarpdd

I've taken another attempt at fixing the ROCm issues.

@fractal-fumbler Try again to see if it is working for you now, and if it is please also try some larger image sizes to see if you encounter errors.

@ddvarpdd You can try again with PyTorch 1.13.1, I'm going to take another try at getting it working. Just let me know if you encounter any more errors.

brkirch avatar Jan 12 '23 13:01 brkirch

neatto! i was able to generate picture with txt2img with this command

Command
PYTORCH_CUDA_ALLOC_CONF=garbage_collection_threshold:0.6,roundup_power2_divisions:4,max_split_size_mb:128 PYTHONPATH=/tmp/stable-diffusion-webui python launch.py --theme=dark --opt-split-attention --opt-channelslast --always-batch-cond-uncond --medvram --opt-sub-quad-attention --precision upcast --upcast-attn

VRAM consumption for 768x768 picture reduced from 6.7 gb to 5.2 gb

I've taken another attempt at fixing the ROCm issues.

@fractal-fumbler Try again to see if it is working for you now, and if it is please also try some larger image sizes to see if you encounter errors.

Tho new issue rised: error RuntimeError: expected scalar type Float but found Half, when there is textual inversion embedding in prompt used: test prompt with classipeint as embedding

Traceback
Arguments: ('test prompt with classipeint as embedding', '', 'None', 'None', 20, 9, False, False, 1, 1, 7, 7.5, False, 2282081976.0, -1.0, 0, 0, 0, False, 768, 768, False, 0.7, 2, 'Latent', 0, 0, 0, 0, None, '', 'outputs', False, False, False, False, '', 1, '', 0, '', True, False, False) {}
Traceback (most recent call last):
  File "/tmp/stable-diffusion-webui/modules/call_queue.py", line 45, in f
    res = list(func(*args, **kwargs))
  File "/tmp/stable-diffusion-webui/modules/call_queue.py", line 28, in f
    res = func(*args, **kwargs)
  File "/tmp/stable-diffusion-webui/modules/txt2img.py", line 54, in txt2img
    processed = process_images(p)
  File "/tmp/stable-diffusion-webui/modules/processing.py", line 488, in process_images
    res = process_images_inner(p)
  File "/tmp/stable-diffusion-webui/modules/processing.py", line 607, in process_images_inner
    c = get_conds_with_caching(prompt_parser.get_multicond_learned_conditioning, prompts, p.steps, cached_c)
  File "/tmp/stable-diffusion-webui/modules/processing.py", line 574, in get_conds_with_caching
    cache[1] = function(shared.sd_model, required_prompts, steps)
  File "/tmp/stable-diffusion-webui/modules/prompt_parser.py", line 205, in get_multicond_learned_conditioning
    learned_conditioning = get_learned_conditioning(model, prompt_flat_list, steps)
  File "/tmp/stable-diffusion-webui/modules/prompt_parser.py", line 140, in get_learned_conditioning
    conds = model.get_learned_conditioning(texts)
  File "/tmp/stable-diffusion-webui/repositories/stable-diffusion-stability-ai/ldm/models/diffusion/ddpm.py", line 669, in get_learned_conditioning
    c = self.cond_stage_model(c)
  File "/usr/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/tmp/stable-diffusion-webui/modules/sd_hijack_clip.py", line 222, in forward
    z = self.process_tokens(tokens, multipliers)
  File "/tmp/stable-diffusion-webui/modules/sd_hijack_clip.py", line 247, in process_tokens
    z = self.encode_with_transformers(tokens)
  File "/tmp/stable-diffusion-webui/modules/sd_hijack_open_clip.py", line 30, in encode_with_transformers
    z = self.wrapped.encode_with_transformer(tokens)
  File "/tmp/stable-diffusion-webui/repositories/stable-diffusion-stability-ai/ldm/modules/encoders/modules.py", line 177, in encode_with_transformer
    x = self.text_transformer_forward(x, attn_mask=self.model.attn_mask)
  File "/tmp/stable-diffusion-webui/repositories/stable-diffusion-stability-ai/ldm/modules/encoders/modules.py", line 189, in text_transformer_forward
    x = r(x, attn_mask=attn_mask)
  File "/usr/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/tmp/stable-diffusion-webui/auto/lib/python3.10/site-packages/open_clip/transformer.py", line 193, in forward
    x = x + self.ls_1(self.attention(self.ln_1(x), attn_mask=attn_mask))
  File "/usr/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/tmp/stable-diffusion-webui/auto/lib/python3.10/site-packages/open_clip/transformer.py", line 27, in forward
    x = F.layer_norm(x, self.normalized_shape, self.weight, self.bias, self.eps)
  File "/usr/lib/python3.10/site-packages/torch/nn/functional.py", line 2515, in layer_norm
    return torch.layer_norm(input, normalized_shape, weight, bias, eps, torch.backends.cudnn.enabled)
RuntimeError: expected scalar type Float but found Half

downloaded embedding from https://huggingface.co/EldritchAdam/classipeint

fractal-fumbler avatar Jan 12 '23 17:01 fractal-fumbler

Tested both on 1.13.1 and the same 2.0.0 nightly, no embeddings used, similar error but with much larger traceback on both.

1.13.1

Traceback (most recent call last):
  File "/home/d/webui/modules/call_queue.py", line 45, in f
    res = list(func(*args, **kwargs))
  File "/home/d/webui/modules/call_queue.py", line 28, in f
    res = func(*args, **kwargs)
  File "/home/d/webui/modules/txt2img.py", line 52, in txt2img
    processed = process_images(p)
  File "/home/d/webui/modules/processing.py", line 479, in process_images
    res = process_images_inner(p)
  File "/home/d/webui/modules/processing.py", line 608, in process_images_inner
    samples_ddim = p.sample(conditioning=c, unconditional_conditioning=uc, seeds=seeds, subseeds=subseeds, subseed_strength=p.subseed_strength, prompts=prompts)
  File "/home/d/webui/modules/processing.py", line 797, in sample
    samples = self.sampler.sample(self, x, conditioning, unconditional_conditioning, image_conditioning=self.txt2img_image_conditioning(x))
  File "/home/d/webui/modules/sd_samplers.py", line 537, in sample
    samples = self.launch_sampling(steps, lambda: self.func(self.model_wrap_cfg, x, extra_args={
  File "/home/d/webui/modules/sd_samplers.py", line 440, in launch_sampling
    return func()
  File "/home/d/webui/modules/sd_samplers.py", line 537, in <lambda>
    samples = self.launch_sampling(steps, lambda: self.func(self.model_wrap_cfg, x, extra_args={
  File "/home/d/webui/venv/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "/home/d/webui/repositories/k-diffusion/k_diffusion/sampling.py", line 594, in sample_dpmpp_2m
    denoised = model(x, sigmas[i] * s_in, **extra_args)
  File "/home/d/webui/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/d/webui/modules/sd_samplers.py", line 351, in forward
    x_out[a:b] = self.inner_model(x_in[a:b], sigma_in[a:b], cond={"c_crossattn": [tensor[a:b]], "c_concat": [image_cond_in[a:b]]})
  File "/home/d/webui/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/d/webui/repositories/k-diffusion/k_diffusion/external.py", line 112, in forward
    eps = self.get_eps(input * c_in, self.sigma_to_t(sigma), **kwargs)
  File "/home/d/webui/repositories/k-diffusion/k_diffusion/external.py", line 138, in get_eps
    return self.inner_model.apply_model(*args, **kwargs)
  File "/home/d/webui/modules/sd_hijack_unet.py", line 46, in apply_model
    return orig_apply_model(self, x_noisy.to(devices.dtype_unet), t.to(devices.dtype_unet), cond, **kwargs).to(devices.dtype)
  File "/home/d/webui/repositories/stable-diffusion-stability-ai/ldm/models/diffusion/ddpm.py", line 858, in apply_model
    x_recon = self.model(x_noisy, t, **cond)
  File "/home/d/webui/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1212, in _call_impl
    result = forward_call(*input, **kwargs)
  File "/home/d/webui/repositories/stable-diffusion-stability-ai/ldm/models/diffusion/ddpm.py", line 1329, in forward
    out = self.diffusion_model(x, t, context=cc)
  File "/home/d/webui/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/d/webui/repositories/stable-diffusion-stability-ai/ldm/modules/diffusionmodules/openaimodel.py", line 776, in forward
    h = module(h, emb, context)
  File "/home/d/webui/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/d/webui/repositories/stable-diffusion-stability-ai/ldm/modules/diffusionmodules/openaimodel.py", line 82, in forward
    x = layer(x, emb)
  File "/home/d/webui/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/d/webui/modules/sd_hijack_checkpoint.py", line 10, in ResBlock_forward
    return checkpoint(self._forward, x, emb)
  File "/home/d/webui/venv/lib/python3.10/site-packages/torch/utils/checkpoint.py", line 249, in checkpoint
    return CheckpointFunction.apply(function, preserve, *args)
  File "/home/d/webui/venv/lib/python3.10/site-packages/torch/utils/checkpoint.py", line 107, in forward
    outputs = run_function(*args)
  File "/home/d/webui/repositories/stable-diffusion-stability-ai/ldm/modules/diffusionmodules/openaimodel.py", line 262, in _forward
    h = self.in_layers(x)
  File "/home/d/webui/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/d/webui/venv/lib/python3.10/site-packages/torch/nn/modules/container.py", line 204, in forward
    input = module(input)
  File "/home/d/webui/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/d/webui/repositories/stable-diffusion-stability-ai/ldm/modules/diffusionmodules/util.py", line 219, in forward
    return super().forward(x.float()).type(x.dtype)
  File "/home/d/webui/venv/lib/python3.10/site-packages/torch/nn/modules/normalization.py", line 273, in forward
    return F.group_norm(
  File "/home/d/webui/venv/lib/python3.10/site-packages/torch/nn/functional.py", line 2528, in group_norm
    return torch.group_norm(input, num_groups, weight, bias, eps, torch.backends.cudnn.enabled)
RuntimeError: expected scalar type Float but found Half
2.0.0.dev20230111

Traceback (most recent call last):
  File "/home/d/webui/modules/call_queue.py", line 45, in f
    res = list(func(*args, **kwargs))
  File "/home/d/webui/modules/call_queue.py", line 28, in f
    res = func(*args, **kwargs)
  File "/home/d/webui/modules/txt2img.py", line 52, in txt2img
    processed = process_images(p)
  File "/home/d/webui/modules/processing.py", line 479, in process_images
    res = process_images_inner(p)
  File "/home/d/webui/modules/processing.py", line 608, in process_images_inner
    samples_ddim = p.sample(conditioning=c, unconditional_conditioning=uc, seeds=seeds, subseeds=subseeds, subseed_strength=p.subseed_strength, prompts=prompts)
  File "/home/d/webui/modules/processing.py", line 797, in sample
    samples = self.sampler.sample(self, x, conditioning, unconditional_conditioning, image_conditioning=self.txt2img_image_conditioning(x))
  File "/home/d/webui/modules/sd_samplers.py", line 537, in sample
    samples = self.launch_sampling(steps, lambda: self.func(self.model_wrap_cfg, x, extra_args={
  File "/home/d/webui/modules/sd_samplers.py", line 440, in launch_sampling
    return func()
  File "/home/d/webui/modules/sd_samplers.py", line 537, in <lambda>
    samples = self.launch_sampling(steps, lambda: self.func(self.model_wrap_cfg, x, extra_args={
  File "/home/d/webui/venv/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/home/d/webui/repositories/k-diffusion/k_diffusion/sampling.py", line 594, in sample_dpmpp_2m
    denoised = model(x, sigmas[i] * s_in, **extra_args)
  File "/home/d/webui/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/d/webui/modules/sd_samplers.py", line 351, in forward
    x_out[a:b] = self.inner_model(x_in[a:b], sigma_in[a:b], cond={"c_crossattn": [tensor[a:b]], "c_concat": [image_cond_in[a:b]]})
  File "/home/d/webui/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/d/webui/repositories/k-diffusion/k_diffusion/external.py", line 112, in forward
    eps = self.get_eps(input * c_in, self.sigma_to_t(sigma), **kwargs)
  File "/home/d/webui/repositories/k-diffusion/k_diffusion/external.py", line 138, in get_eps
    return self.inner_model.apply_model(*args, **kwargs)
  File "/home/d/webui/modules/sd_hijack_unet.py", line 46, in apply_model
    return orig_apply_model(self, x_noisy.to(devices.dtype_unet), t.to(devices.dtype_unet), cond, **kwargs).to(devices.dtype)
  File "/home/d/webui/repositories/stable-diffusion-stability-ai/ldm/models/diffusion/ddpm.py", line 858, in apply_model
    x_recon = self.model(x_noisy, t, **cond)
  File "/home/d/webui/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1519, in _call_impl
    result = forward_call(*args, **kwargs)
  File "/home/d/webui/repositories/stable-diffusion-stability-ai/ldm/models/diffusion/ddpm.py", line 1329, in forward
    out = self.diffusion_model(x, t, context=cc)
  File "/home/d/webui/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/d/webui/repositories/stable-diffusion-stability-ai/ldm/modules/diffusionmodules/openaimodel.py", line 776, in forward
    h = module(h, emb, context)
  File "/home/d/webui/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/d/webui/repositories/stable-diffusion-stability-ai/ldm/modules/diffusionmodules/openaimodel.py", line 82, in forward
    x = layer(x, emb)
  File "/home/d/webui/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/d/webui/modules/sd_hijack_checkpoint.py", line 10, in ResBlock_forward
    return checkpoint(self._forward, x, emb)
  File "/home/d/webui/venv/lib/python3.10/site-packages/torch/utils/checkpoint.py", line 249, in checkpoint
    return CheckpointFunction.apply(function, preserve, *args)
  File "/home/d/webui/venv/lib/python3.10/site-packages/torch/autograd/function.py", line 453, in apply
    return super().apply(*args, **kwargs)
  File "/home/d/webui/venv/lib/python3.10/site-packages/torch/utils/checkpoint.py", line 107, in forward
    outputs = run_function(*args)
  File "/home/d/webui/repositories/stable-diffusion-stability-ai/ldm/modules/diffusionmodules/openaimodel.py", line 262, in _forward
    h = self.in_layers(x)
  File "/home/d/webui/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/d/webui/venv/lib/python3.10/site-packages/torch/nn/modules/container.py", line 217, in forward
    input = module(input)
  File "/home/d/webui/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/d/webui/repositories/stable-diffusion-stability-ai/ldm/modules/diffusionmodules/util.py", line 219, in forward
    return super().forward(x.float()).type(x.dtype)
  File "/home/d/webui/venv/lib/python3.10/site-packages/torch/nn/modules/normalization.py", line 273, in forward
    return F.group_norm(
  File "/home/d/webui/venv/lib/python3.10/site-packages/torch/nn/functional.py", line 2530, in group_norm
    return torch.group_norm(input, num_groups, weight, bias, eps, torch.backends.cudnn.enabled)
RuntimeError: mixed dtype (CPU): expect parameter to have scalar type of Float

ddvarpdd avatar Jan 12 '23 19:01 ddvarpdd

testing further no-half vs upcast :)

examples with --no-half --precision full --no-half-vae

no-half

photo_2023-01-13_01-55-21

examples with --precision upcast --upcast-attn

upcast

photo_2023-01-13_01-55-25

fractal-fumbler avatar Jan 12 '23 22:01 fractal-fumbler

@fractal-fumbler @ddvarpdd I've renamed --precision upcast to --upcast_sampling Please try using --upcast_sampling without --precision. This should reenable autocast, which will hopefully fix any remaining issues (or this may break everything - I have no way to be sure unless someone with ROCm tests it). That said, even if it seems to work, if @fractal-fumbler has any images from before this change that can be regenerated then it would be a good idea to check and make sure regenerating results in an identical image.

brkirch avatar Jan 14 '23 07:01 brkirch

Please try using --upcast_sampling without --precision

@brkirch, latest patch (96093475731c2f95fba3911ee66e5065deb21005) with

  1. --opt-split-attention --opt-channelslast --always-batch-cond-uncond --medvram --opt-sub-quad-attention --upcast-sampling --upcast-attn gives black images

  2. --opt-split-attention --opt-channelslast --always-batch-cond-uncond --medvram --opt-sub-quad-attention --precision upcast --upcast-att --precision full

gives error RuntimeError: expected scalar type Half but found Float

Traceback
Traceback (most recent call last):
  File "/tmp/stable-diffusion-20/modules/call_queue.py", line 45, in f
    res = list(func(*args, **kwargs))
  File "/tmp/stable-diffusion-20/modules/call_queue.py", line 28, in f
    res = func(*args, **kwargs)
  File "/tmp/stable-diffusion-20/modules/txt2img.py", line 54, in txt2img
    processed = process_images(p)
  File "/tmp/stable-diffusion-20/modules/processing.py", line 488, in process_images
    res = process_images_inner(p)
  File "/tmp/stable-diffusion-20/modules/processing.py", line 617, in process_images_inner
    samples_ddim = p.sample(conditioning=c, unconditional_conditioning=uc, seeds=seeds, subseeds=subseeds, subseed_strength=p.subseed_strength, prompts=prompts)
  File "/tmp/stable-diffusion-20/modules/processing.py", line 806, in sample
    samples = self.sampler.sample(self, x, conditioning, unconditional_conditioning, image_conditioning=self.txt2img_image_conditioning(x), mimic_scale=self.mimic_scale, threshold_enable=self.threshold_enable)
  File "/tmp/stable-diffusion-20/modules/sd_samplers.py", line 574, in sample
    samples = self.launch_sampling(steps, lambda: self.func(self.model_wrap_cfg, x, extra_args={
  File "/tmp/stable-diffusion-20/modules/sd_samplers.py", line 475, in launch_sampling
    return func()
  File "/tmp/stable-diffusion-20/modules/sd_samplers.py", line 574, in <lambda>
    samples = self.launch_sampling(steps, lambda: self.func(self.model_wrap_cfg, x, extra_args={
  File "/usr/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "/tmp/stable-diffusion-20/repositories/k-diffusion/k_diffusion/sampling.py", line 145, in sample_euler_ancestral
    denoised = model(x, sigmas[i] * s_in, **extra_args)
  File "/usr/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/tmp/stable-diffusion-20/modules/sd_samplers.py", line 373, in forward
    x_out = self.inner_model(x_in, sigma_in, cond={"c_crossattn": [cond_in], "c_concat": [image_cond_in]})
  File "/usr/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/tmp/stable-diffusion-20/repositories/k-diffusion/k_diffusion/external.py", line 167, in forward
    return self.get_v(input * c_in, self.sigma_to_t(sigma), **kwargs) * c_out + input * c_skip
  File "/tmp/stable-diffusion-20/repositories/k-diffusion/k_diffusion/external.py", line 177, in get_v
    return self.inner_model.apply_model(x, t, cond)
  File "/tmp/stable-diffusion-20/modules/sd_hijack_unet.py", line 46, in apply_model
    return orig_apply_model(self, x_noisy.to(devices.dtype_unet), t.to(devices.dtype_unet), cond, **kwargs).to(torch.float32)
  File "/tmp/stable-diffusion-20/repositories/stable-diffusion-stability-ai/ldm/models/diffusion/ddpm.py", line 858, in apply_model
    x_recon = self.model(x_noisy, t, **cond)
  File "/usr/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1212, in _call_impl
    result = forward_call(*input, **kwargs)
  File "/tmp/stable-diffusion-20/repositories/stable-diffusion-stability-ai/ldm/models/diffusion/ddpm.py", line 1329, in forward
    out = self.diffusion_model(x, t, context=cc)
  File "/usr/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/tmp/stable-diffusion-20/repositories/stable-diffusion-stability-ai/ldm/modules/diffusionmodules/openaimodel.py", line 776, in forward
    h = module(h, emb, context)
  File "/usr/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/tmp/stable-diffusion-20/repositories/stable-diffusion-stability-ai/ldm/modules/diffusionmodules/openaimodel.py", line 84, in forward
    x = layer(x, context)
  File "/usr/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/tmp/stable-diffusion-20/repositories/stable-diffusion-stability-ai/ldm/modules/attention.py", line 334, in forward
    x = block(x, context=context[i])
  File "/usr/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/tmp/stable-diffusion-20/modules/sd_hijack_checkpoint.py", line 4, in BasicTransformerBlock_forward
    return checkpoint(self._forward, x, context)
  File "/usr/lib/python3.10/site-packages/torch/utils/checkpoint.py", line 249, in checkpoint
    return CheckpointFunction.apply(function, preserve, *args)
  File "/usr/lib/python3.10/site-packages/torch/utils/checkpoint.py", line 107, in forward
    outputs = run_function(*args)
  File "/tmp/stable-diffusion-20/repositories/stable-diffusion-stability-ai/ldm/modules/attention.py", line 272, in _forward
    x = self.attn1(self.norm1(x), context=context if self.disable_self_attn else None) + x
  File "/usr/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/tmp/stable-diffusion-20/modules/sd_hijack_optimizations.py", line 260, in sub_quad_attention_forward
    x = sub_quad_attention(q, k, v, q_chunk_size=shared.cmd_opts.sub_quad_q_chunk_size, kv_chunk_size=shared.cmd_opts.sub_quad_kv_chunk_size, chunk_threshold=shared.cmd_opts.sub_quad_chunk_threshold, use_checkpoint=self.training)
  File "/tmp/stable-diffusion-20/modules/sd_hijack_optimizations.py", line 296, in sub_quad_attention
    return efficient_dot_product_attention(
  File "/tmp/stable-diffusion-20/modules/sub_quadratic_attention.py", line 207, in efficient_dot_product_attention
    res = torch.cat([
  File "/tmp/stable-diffusion-20/modules/sub_quadratic_attention.py", line 208, in <listcomp>
    compute_query_chunk_attn(
  File "/tmp/stable-diffusion-20/modules/sub_quadratic_attention.py", line 132, in _get_attention_scores_no_kv_chunking
    hidden_states_slice = torch.bmm(attn_probs, value)
RuntimeError: expected scalar type Half but found Float
  1. --opt-split-attention --opt-channelslast --always-batch-cond-uncond --medvram --opt-sub-quad-attention --upcast-sampling black images too

usually black images on SD-2.1 are generated when --no-half aka fp32 isn't used in generation proccess

fractal-fumbler avatar Jan 14 '23 10:01 fractal-fumbler

Disappointing, but although I did think there was a decent chance it wouldn't work it was definitely worth a try. Thank you for testing, I'll revert those changes for now and continue working on a fix for the embeddings error and the error @ddvarpdd is getting.

brkirch avatar Jan 14 '23 10:01 brkirch

hadn't checked, but FYI, 96093475731c2f95fba3911ee66e5065deb21005 and 280083c9c30058d092c6b6f6aadac5e669b322fc didn't throw error with embedding being used. tho giving black images

fractal-fumbler avatar Jan 14 '23 11:01 fractal-fumbler

@ddvarpdd is your model for SD-1.5 or SD-2.0 or SD-2.1?

fractal-fumbler avatar Jan 14 '23 12:01 fractal-fumbler

so, @brkirch, 96093475731c2f95fba3911ee66e5065deb21005 working on SD-1.5 and SD-2.0 (for me on pytorch 1.13.1), including embeddings :)

also tested on 280083c9c30058d092c6b6f6aadac5e669b322fc - works

getting increased speed with Euler a from 1.3 it/s to 1.6 it/s on 768x768 picture generation

p.s. sorry, that didn't tested on SD-2.0 or SD-1.5

fractal-fumbler avatar Jan 14 '23 13:01 fractal-fumbler

Sounds like getting autocast turned back on did the trick after all! Thank you both for testing!

@ddvarpdd Testing on PyTorch 1.13.1 would probably be a good idea, as 2.0 is probably the cause of the black squares. Also if you could try commit 96093475731c2f95fba3911ee66e5065deb21005 with a SD 1.5 model and see if it works then I’d much appreciate it.

brkirch avatar Jan 14 '23 13:01 brkirch

Accidentally bumped the “Close with comment” button (I’m on a phone right now).

brkirch avatar Jan 14 '23 13:01 brkirch

so amd (rocm) problem with SD-2.1 is that there is no xformers available and amd users can't make SD-2.1 to force to use fp16 with, since it is only working with xformers installed.

https://github.com/Stability-AI/stablediffusion Per default, the attention operation of the model is evaluated at full precision when xformers is not installed. To enable fp16 (which can cause numerical instabilities with the vanilla attention module on the v2.1 model) , run your script with ATTN_PRECISION=fp16 python

that's why generation on SD-2.1 for amd users is possible with --no-half

fractal-fumbler avatar Jan 14 '23 13:01 fractal-fumbler

@fractal-fumbler Cross attention layer optimizations actually override that by monkey patching CrossAttention.forward() I suspect the issue you were seeing is due to autocast evaluating entirely at float16 precision rather than evaluating at float32 precision and downcasting. It’s not hard to fix fortunately.

@ddvarpdd The most recent change will default a few more things to float16 and so it could be causing a few issues. I haven’t had the chance to test the upscalers just yet. It does however sound like it is probably being caused by running out of memory. If you haven’t already, try running without the new options and see if you get the same issue. If you do, then it is probably not directly related to this PR.

brkirch avatar Jan 14 '23 14:01 brkirch

after pulling latest commits with 96093475731c2f95fba3911ee66e5065deb21005 i am getting vram is around 4.7-4.8 GB with generating 768x768 and with 1024x768 (hires fix) upd: w/o medvram it's like 5.0-5.1 GB VRAM usage

using SD-2.0 model tho

comparison of speed (it/s|s/it): i didn't noticed any big difference with brkirch patch it's like 0.2-0.5 seconds more for speed.

fractal-fumbler avatar Jan 14 '23 18:01 fractal-fumbler

@fractal-fumbler See if the latest change prevents black images with --upcast-attn.

brkirch avatar Jan 16 '23 23:01 brkirch