stablediffusion RuntimeError: Input type (c10::Half) and bias type (float) should be the same

I am using M1 Pro MacBook and I am trying to develop a stablediffusion using mps.

I changed the part about cuda to mps and changed it from ddim.py to float32 because mps did not support float64.

def register_buffer(self, name, attr): if type(attr) == torch.Tensor: if attr.device != torch.device("mps"): attr = attr.to(torch.float32).to(torch.device("mps")) setattr(self, name, attr)

def make_schedule(self, ddim_num_steps, ddim_discretize="uniform", ddim_eta=0., verbose=True): self.ddim_timesteps = make_ddim_timesteps(ddim_discr_method=ddim_discretize, num_ddim_timesteps=ddim_num_steps, num_ddpm_timesteps=self.ddpm_num_timesteps,verbose=verbose) alphas_cumprod = self.model.alphas_cumprod assert alphas_cumprod.shape[0] == self.ddpm_num_timesteps, 'alphas have to be defined for each timestep' to_torch = lambda x: x.clone().detach().to(torch.float32).to(self.model.device)

Since then, this problem has occurred in conv.py

def _conv_forward(self, input: Tensor, weight: Tensor, bias: Optional[Tensor]): if self.padding_mode != 'zeros': return F.conv2d(F.pad(input, self._reversed_padding_repeated_twice, mode=self.padding_mode), weight, bias, self.stride, _pair(0), self.dilation, self.groups) return F.conv2d(input, weight, bias, self.stride, self.padding, self.dilation, self.groups)

def forward(self, input: Tensor) -> Tensor: return self._conv_forward(input, self.weight, self.bias)

Help me please.

Jan 29 '23 18:01 jiho9702

Hey, how is it going? Did you figure it out?

Feb 26 '23 06:02 FalseGenius

No i didnt 😢

Feb 26 '23 07:02 jiho9702

Well, that's too bad. I guess we're stuck then.

Feb 26 '23 13:02 FalseGenius

Are you having the same problem?

Feb 26 '23 14:02 jiho9702

I was having that problem when I altered superresolution.py for my use case. You could try running the pipeline provided by diffusers though.. Since this issue got no response, I switched to running the text2img.py, and now, I am getting new error,

(base) root@stablediffusion$ python scripts/txt2img.py --prompt "a professional photograph ofan astronaut riding a horse" --ckpt v2-1_768-nonema-pruned.ckpt --config configs/stable-diffusion/v2-inference-v.yaml --H 768--W 768"

"RuntimeError: expected scalar type BFloat16 but found Float"

Feb 26 '23 14:02 FalseGenius

Wait, you're getting the problem from something else, not the superresolution.py script

Feb 26 '23 14:02 FalseGenius

Ok, I was fooling around, and got "RuntimeError: Input type (c10::Half) and bias type (float) should be the same" error again. It doesn't show up when you use "cuda". Why are you trying mps instead?

Feb 26 '23 16:02 FalseGenius

You can use this fork supported mps! https://github.com/Tps-F/stablediffusion

Feb 27 '23 00:02 Tps-F

it's not work 🥺

this occurred error

Traceback (most recent call last): File "/Users/blackcat/study/stablediffusion/scripts/txt2img.py", line 393, in main(opt) File "/Users/blackcat/study/stablediffusion/scripts/txt2img.py", line 352, in main samples, _ = sampler.sample(S=opt.steps, File "/opt/anaconda3/envs/cat/lib/python3.9/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context return func(*args, **kwargs) File "/Users/blackcat/study/stablediffusion/scripts/ldm/models/diffusion/ddim.py", line 107, in sample samples, intermediates = self.ddim_sampling(conditioning, size, File "/opt/anaconda3/envs/cat/lib/python3.9/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context return func(*args, **kwargs) File "/Users/blackcat/study/stablediffusion/scripts/ldm/models/diffusion/ddim.py", line 167, in ddim_sampling outs = self.p_sample_ddim(img, cond, ts, index=index, use_original_steps=ddim_use_original_steps, File "/opt/anaconda3/envs/cat/lib/python3.9/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context return func(*args, **kwargs) File "/Users/blackcat/study/stablediffusion/scripts/ldm/models/diffusion/ddim.py", line 215, in p_sample_ddim model_uncond, model_t = self.model.apply_model(x_in, t_in, c_in).chunk(2) File "/Users/blackcat/study/stablediffusion/scripts/ldm/models/diffusion/ddpm.py", line 858, in apply_model x_recon = self.model(x_noisy, t, **cond) File "/opt/anaconda3/envs/cat/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl return forward_call(*input, **kwargs) File "/Users/blackcat/study/stablediffusion/scripts/ldm/models/diffusion/ddpm.py", line 1335, in forward out = self.diffusion_model(x, t, context=cc) File "/opt/anaconda3/envs/cat/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl return forward_call(*input, **kwargs) File "/Users/blackcat/study/stablediffusion/scripts/ldm/modules/diffusionmodules/openaimodel.py", line 778, in forward h = module(h, emb, context) File "/opt/anaconda3/envs/cat/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl return forward_call(*input, **kwargs) File "/Users/blackcat/study/stablediffusion/scripts/ldm/modules/diffusionmodules/openaimodel.py", line 86, in forward x = layer(x) File "/opt/anaconda3/envs/cat/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl return forward_call(*input, **kwargs) File "/opt/anaconda3/envs/cat/lib/python3.9/site-packages/torch/nn/modules/conv.py", line 463, in forward return self._conv_forward(input, self.weight, self.bias) File "/opt/anaconda3/envs/cat/lib/python3.9/site-packages/torch/nn/modules/conv.py", line 459, in _conv_forward return F.conv2d(input, weight, bias, self.stride, RuntimeError: Input type (c10::Half) and bias type (float) should be the same

Mar 03 '23 00:03 jiho9702

After days of troubleshooting, I was able to resolve this by upgrading tensorflow to 2.11.0 and editing the v2-inference.yaml file's parameter of use_fp16 to False

Mar 03 '23 01:03 lakejee-rebel

Try to use v2-inference-v-mac.yaml

Mar 03 '23 10:03 Tps-F

@lakejee-rebel @Tps-F How long does it take to execute? It takes an hour to create an image on Tps-F's stable diffusion model

Mar 03 '23 10:03 jiho9702

Here is!

https://github.com/Stability-AI/stablediffusion/pull/163#issuecomment-1422351441

Mar 03 '23 10:03 Tps-F

@Tps-F It's faster because I reduced the batch size. Thank you. Are you interested in object detection like ssd(single shot multibox detector) or YOLO? I want trying ssd in m1 Mac but that model used to CUDA how to convert CUDA to MPS?

Mar 03 '23 10:03 jiho9702

Shall I do it?

Mar 04 '23 01:03 Tps-F

I'd appreciate it if you did that.

Mar 04 '23 03:03 jiho9702

@Tps-F Can i follow you?

Mar 04 '23 05:03 jiho9702

Sure! By the way, There seems to be more than one in ssd and YOLO, which one should I support?

Since we are not going to talk here, would you like to go to the discord or something?

Mar 04 '23 05:03 Tps-F

Okay good what is your discord id? I will follow you

Mar 05 '23 05:03 jiho9702

Thank you- Ftps#3389

Mar 05 '23 22:03 Tps-F

@Tps-F Hi, I get a similar error using your fork:

.../venv/lib/python3.10/site-packages/torch/nn/modules/conv.py", line 459, in _conv_forward
    return F.conv2d(input, weight, bias, self.stride,
RuntimeError: Input type (float) and bias type (c10::Half) should be the same

I used your v2-inference-v-mac.yaml as well, and updated tensorflow to 2.11.0 as suggested but it doesn't work...

Can I connect with you on Discord? I already sent a request... :)

Apr 16 '23 16:04 yyahav

I would like to see all the logs and what you have run. Can you show me?

Can I connect with you on Discord? I already sent a request... :)

Sure! But I might as well talk about it here in case anyone encounters a similar error in the future!

Apr 16 '23 16:04 Tps-F

return F.conv2d(input, weight, bias, self.stride,

RuntimeError: Input type (c10::Half) and bias type (float) should be the same

I also got the above problem @yyahav . I am using ubuntu not macOS

Apr 18 '23 12:04 tommysnu

@tommysnu Can you please share the entire stacktrace? I've made a change in the code which seems to work for me

Apr 18 '23 12:04 yyahav

Likewise, please share your logs with us so I can improve.

Apr 18 '23 12:04 Tps-F

Likewise, please share your logs with us so I can improve.

Traceback (most recent call last):
  File "/mnt/workspace/stablediffusion/scripts/txt2img.py", line 388, in <module>
    main(opt)
  File "/mnt/workspace/stablediffusion/scripts/txt2img.py", line 347, in main
    samples, _ = sampler.sample(S=opt.steps,
  File "/home/tommy/anaconda3/envs/t2im/lib/python3.9/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "/mnt/workspace/stablediffusion/ldm/models/diffusion/ddim.py", line 104, in sample
    samples, intermediates = self.ddim_sampling(conditioning, size,
  File "/home/tommy/anaconda3/envs/t2im/lib/python3.9/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "/mnt/workspace/stablediffusion/ldm/models/diffusion/ddim.py", line 164, in ddim_sampling
    outs = self.p_sample_ddim(img, cond, ts, index=index, use_original_steps=ddim_use_original_steps,
  File "/home/tommy/anaconda3/envs/t2im/lib/python3.9/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "/mnt/workspace/stablediffusion/ldm/models/diffusion/ddim.py", line 212, in p_sample_ddim
    model_uncond, model_t = self.model.apply_model(x_in, t_in, c_in).chunk(2)
  File "/mnt/workspace/stablediffusion/ldm/models/diffusion/ddpm.py", line 858, in apply_model
    x_recon = self.model(x_noisy, t, **cond)
  File "/home/tommy/anaconda3/envs/t2im/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/mnt/workspace/stablediffusion/ldm/models/diffusion/ddpm.py", line 1335, in forward
    out = self.diffusion_model(x, t, context=cc)
  File "/home/tommy/anaconda3/envs/t2im/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/mnt/workspace/stablediffusion/ldm/modules/diffusionmodules/openaimodel.py", line 797, in forward
    h = module(h, emb, context)
  File "/home/tommy/anaconda3/envs/t2im/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/mnt/workspace/stablediffusion/ldm/modules/diffusionmodules/openaimodel.py", line 86, in forward
    x = layer(x)
  File "/home/tommy/anaconda3/envs/t2im/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/tommy/anaconda3/envs/t2im/lib/python3.9/site-packages/torch/nn/modules/conv.py", line 463, in forward
    return self._conv_forward(input, self.weight, self.bias)
  File "/home/tommy/anaconda3/envs/t2im/lib/python3.9/site-packages/torch/nn/modules/conv.py", line 459, in _conv_forward
    return F.conv2d(input, weight, bias, self.stride,
RuntimeError: Input type (c10::Half) and bias type (float) should be the same

This is my logs after I run:

python scripts/txt2img.py --prompt "a professional photograph of an astronaut riding a horse" --ckpt v2-1_768-ema-pruned.ckpt --config configs/stable-diffusion/v2-inference-v.yaml --H 768 --W 768

(Link: https://github.com/Stability-AI/stablediffusion#reference-sampling-script)

Could you give me any suggestion @yyahav and @Tps-F ? Thank you so much

Apr 18 '23 12:04 tommysnu

I know you are using ubuntu, could you try using config for mac? https://github.com/Tps-F/stablediffusion/blob/mps-cpu-support/configs/stable-diffusion/mac/v2-inference-v-mac.yaml

Apr 18 '23 13:04 Tps-F

I think the reason this happens is because you are using fp16

Apr 18 '23 13:04 Tps-F

I know you are using ubuntu, could you try using config for mac? https://github.com/Tps-F/stablediffusion/blob/mps-cpu-support/configs/stable-diffusion/mac/v2-inference-v-mac.yaml

Thanks Tps-F. After using this config file I get other error as bellow:

Sampling:   0%|                                                      | 0/3 [00:00<?, ?it/sData shape for DDIM sampling is (3, 4, 96, 96), eta 0.0               | 0/1 [00:00<?, ?it/s]
Running DDIM Sampling with 50 timesteps
DDIM Sampler:   0%|                                                 | 0/50 [00:00<?, ?it/s]
data:   0%|                                                          | 0/1 [00:02<?, ?it/s]
Sampling:   0%|                                                      | 0/3 [00:02<?, ?it/s]
╭─────────────────────────── Traceback (most recent call last) ───────────────────────────╮
│ /mnt/workspace/stablediffusion/scripts/txt2img.py:388 in <module>                       │
│                                                                                         │
│   385                                                                                   │
│   386 if __name__ == "__main__":                                                        │
│   387 │   opt = parse_args()                                                            │
│ ❱ 388 │   main(opt)                                                                     │
│   389                                                                                   │
│                                                                                         │
│ /mnt/workspace/stablediffusion/scripts/txt2img.py:347 in main                           │
│                                                                                         │
│   344 │   │   │   │   │   │   prompts = list(prompts)                                   │
│   345 │   │   │   │   │   c = model.get_learned_conditioning(prompts)                   │
│   346 │   │   │   │   │   shape = [opt.C, opt.H // opt.f, opt.W // opt.f]               │
│ ❱ 347 │   │   │   │   │   samples, _ = sampler.sample(S=opt.steps,                      │
│   348 │   │   │   │   │   │   │   │   │   │   │   │   │    conditioning=c,              │
│   349 │   │   │   │   │   │   │   │   │   │   │   │   │    batch_size=opt.n_samples,    │
│   350 │   │   │   │   │   │   │   │   │   │   │   │   │    shape=shape,                 │
│                                                                                         │
│ /home/tommy/anaconda3/envs/t2im/lib/python3.9/site-packages/torch/autograd/grad_mode.py │
│ :27 in decorate_context                                                                 │
│                                                                                         │
│    24 │   │   @functools.wraps(func)                                                    │
│    25 │   │   def decorate_context(*args, **kwargs):                                    │
│    26 │   │   │   with self.clone():                                                    │
│ ❱  27 │   │   │   │   return func(*args, **kwargs)                                      │
│    28 │   │   return cast(F, decorate_context)                                          │
│    29 │                                                                                 │
│    30 │   def _wrap_generator(self, func):                                              │
│                                                                                         │
│ /mnt/workspace/stablediffusion/ldm/models/diffusion/ddim.py:104 in sample               │
│                                                                                         │
│   101 │   │   size = (batch_size, C, H, W)                                              │
│   102 │   │   print(f'Data shape for DDIM sampling is {size}, eta {eta}')               │
│   103 │   │                                                                             │
│ ❱ 104 │   │   samples, intermediates = self.ddim_sampling(conditioning, size,           │
│   105 │   │   │   │   │   │   │   │   │   │   │   │   │   callback=callback,            │
│   106 │   │   │   │   │   │   │   │   │   │   │   │   │   img_callback=img_callback,    │
│   107 │   │   │   │   │   │   │   │   │   │   │   │   │   quantize_denoised=quantize_x0 │
│                                                                                         │
│ /home/tommy/anaconda3/envs/t2im/lib/python3.9/site-packages/torch/autograd/grad_mode.py │
│ :27 in decorate_context                                                                 │
│                                                                                         │
│    24 │   │   @functools.wraps(func)                                                    │
│    25 │   │   def decorate_context(*args, **kwargs):                                    │
│    26 │   │   │   with self.clone():                                                    │
│ ❱  27 │   │   │   │   return func(*args, **kwargs)                                      │
│    28 │   │   return cast(F, decorate_context)                                          │
│    29 │                                                                                 │
│    30 │   def _wrap_generator(self, func):                                              │
│                                                                                         │
│ /mnt/workspace/stablediffusion/ldm/models/diffusion/ddim.py:164 in ddim_sampling        │
│                                                                                         │
│   161 │   │   │   │   assert len(ucg_schedule) == len(time_range)                       │
│   162 │   │   │   │   unconditional_guidance_scale = ucg_schedule[i]                    │
│   163 │   │   │                                                                         │
│ ❱ 164 │   │   │   outs = self.p_sample_ddim(img, cond, ts, index=index, use_original_st │
│   165 │   │   │   │   │   │   │   │   │     quantize_denoised=quantize_denoised, temper │
│   166 │   │   │   │   │   │   │   │   │     noise_dropout=noise_dropout, score_correcto │
│   167 │   │   │   │   │   │   │   │   │     corrector_kwargs=corrector_kwargs,          │
│                                                                                         │
│ /home/tommy/anaconda3/envs/t2im/lib/python3.9/site-packages/torch/autograd/grad_mode.py │
│ :27 in decorate_context                                                                 │
│                                                                                         │
│    24 │   │   @functools.wraps(func)                                                    │
│    25 │   │   def decorate_context(*args, **kwargs):                                    │
│    26 │   │   │   with self.clone():                                                    │
│ ❱  27 │   │   │   │   return func(*args, **kwargs)                                      │
│    28 │   │   return cast(F, decorate_context)                                          │
│    29 │                                                                                 │
│    30 │   def _wrap_generator(self, func):                                              │
│                                                                                         │
│ /mnt/workspace/stablediffusion/ldm/models/diffusion/ddim.py:212 in p_sample_ddim        │
│                                                                                         │
│   209 │   │   │   │   │   c_in.append(torch.cat([unconditional_conditioning[i], c[i]])) │
│   210 │   │   │   else:                                                                 │
│   211 │   │   │   │   c_in = torch.cat([unconditional_conditioning, c])                 │
│ ❱ 212 │   │   │   model_uncond, model_t = self.model.apply_model(x_in, t_in, c_in).chun │
│   213 │   │   │   model_output = model_uncond + unconditional_guidance_scale * (model_t │
│   214 │   │                                                                             │
│   215 │   │   if self.model.parameterization == "v":                                    │
│                                                                                         │
│ /mnt/workspace/stablediffusion/ldm/models/diffusion/ddpm.py:858 in apply_model          │
│                                                                                         │
│    855 │   │   │   key = 'c_concat' if self.model.conditioning_key == 'concat' else 'c_ │
│    856 │   │   │   cond = {key: cond}                                                   │
│    857 │   │                                                                            │
│ ❱  858 │   │   x_recon = self.model(x_noisy, t, **cond)                                 │
│    859 │   │                                                                            │
│    860 │   │   if isinstance(x_recon, tuple) and not return_ids:                        │
│    861 │   │   │   return x_recon[0]                                                    │
│                                                                                         │
│ /home/tommy/anaconda3/envs/t2im/lib/python3.9/site-packages/torch/nn/modules/module.py: │
│ 1194 in _call_impl                                                                      │
│                                                                                         │
│   1191 │   │   # this function, and just call forward.                                  │
│   1192 │   │   if not (self._backward_hooks or self._forward_hooks or self._forward_pre │
│   1193 │   │   │   │   or _global_forward_hooks or _global_forward_pre_hooks):          │
│ ❱ 1194 │   │   │   return forward_call(*input, **kwargs)                                │
│   1195 │   │   # Do not call functions when jit is used                                 │
│   1196 │   │   full_backward_hooks, non_full_backward_hooks = [], []                    │
│   1197 │   │   if self._backward_hooks or _global_backward_hooks:                       │
│                                                                                         │
│ /mnt/workspace/stablediffusion/ldm/models/diffusion/ddpm.py:1335 in forward             │
│                                                                                         │
│   1332 │   │   │   │   # an error: RuntimeError: forward() is missing value for argumen │
│   1333 │   │   │   │   out = self.scripted_diffusion_model(x, t, cc)                    │
│   1334 │   │   │   else:                                                                │
│ ❱ 1335 │   │   │   │   out = self.diffusion_model(x, t, context=cc)                     │
│   1336 │   │   elif self.conditioning_key == 'hybrid':                                  │
│   1337 │   │   │   xc = torch.cat([x] + c_concat, dim=1)                                │
│   1338 │   │   │   cc = torch.cat(c_crossattn, 1)                                       │
│                                                                                         │
│ /home/tommy/anaconda3/envs/t2im/lib/python3.9/site-packages/torch/nn/modules/module.py: │
│ 1194 in _call_impl                                                                      │
│                                                                                         │
│   1191 │   │   # this function, and just call forward.                                  │
│   1192 │   │   if not (self._backward_hooks or self._forward_hooks or self._forward_pre │
│   1193 │   │   │   │   or _global_forward_hooks or _global_forward_pre_hooks):          │
│ ❱ 1194 │   │   │   return forward_call(*input, **kwargs)                                │
│   1195 │   │   # Do not call functions when jit is used                                 │
│   1196 │   │   full_backward_hooks, non_full_backward_hooks = [], []                    │
│   1197 │   │   if self._backward_hooks or _global_backward_hooks:                       │
│                                                                                         │
│ /mnt/workspace/stablediffusion/ldm/modules/diffusionmodules/openaimodel.py:797 in       │
│ forward                                                                                 │
│                                                                                         │
│   794 │   │                                                                             │
│   795 │   │   h = x.type(self.dtype)                                                    │
│   796 │   │   for module in self.input_blocks:                                          │
│ ❱ 797 │   │   │   h = module(h, emb, context)                                           │
│   798 │   │   │   hs.append(h)                                                          │
│   799 │   │   h = self.middle_block(h, emb, context)                                    │
│   800 │   │   for module in self.output_blocks:                                         │
│                                                                                         │
│ /home/tommy/anaconda3/envs/t2im/lib/python3.9/site-packages/torch/nn/modules/module.py: │
│ 1194 in _call_impl                                                                      │
│                                                                                         │
│   1191 │   │   # this function, and just call forward.                                  │
│   1192 │   │   if not (self._backward_hooks or self._forward_hooks or self._forward_pre │
│   1193 │   │   │   │   or _global_forward_hooks or _global_forward_pre_hooks):          │
│ ❱ 1194 │   │   │   return forward_call(*input, **kwargs)                                │
│   1195 │   │   # Do not call functions when jit is used                                 │
│   1196 │   │   full_backward_hooks, non_full_backward_hooks = [], []                    │
│   1197 │   │   if self._backward_hooks or _global_backward_hooks:                       │
│                                                                                         │
│ /mnt/workspace/stablediffusion/ldm/modules/diffusionmodules/openaimodel.py:84 in        │
│ forward                                                                                 │
│                                                                                         │
│    81 │   │   │   if isinstance(layer, TimestepBlock):                                  │
│    82 │   │   │   │   x = layer(x, emb)                                                 │
│    83 │   │   │   elif isinstance(layer, SpatialTransformer):                           │
│ ❱  84 │   │   │   │   x = layer(x, context)                                             │
│    85 │   │   │   else:                                                                 │
│    86 │   │   │   │   x = layer(x)                                                      │
│    87 │   │   return x                                                                  │
│                                                                                         │
│ /home/tommy/anaconda3/envs/t2im/lib/python3.9/site-packages/torch/nn/modules/module.py: │
│ 1194 in _call_impl                                                                      │
│                                                                                         │
│   1191 │   │   # this function, and just call forward.                                  │
│   1192 │   │   if not (self._backward_hooks or self._forward_hooks or self._forward_pre │
│   1193 │   │   │   │   or _global_forward_hooks or _global_forward_pre_hooks):          │
│ ❱ 1194 │   │   │   return forward_call(*input, **kwargs)                                │
│   1195 │   │   # Do not call functions when jit is used                                 │
│   1196 │   │   full_backward_hooks, non_full_backward_hooks = [], []                    │
│   1197 │   │   if self._backward_hooks or _global_backward_hooks:                       │
│                                                                                         │
│ /mnt/workspace/stablediffusion/ldm/modules/attention.py:334 in forward                  │
│                                                                                         │
│   331 │   │   if self.use_linear:                                                       │
│   332 │   │   │   x = self.proj_in(x)                                                   │
│   333 │   │   for i, block in enumerate(self.transformer_blocks):                       │
│ ❱ 334 │   │   │   x = block(x, context=context[i])                                      │
│   335 │   │   if self.use_linear:                                                       │
│   336 │   │   │   x = self.proj_out(x)                                                  │
│   337 │   │   x = rearrange(x, 'b (h w) c -> b c h w', h=h, w=w).contiguous()           │
│                                                                                         │
│ /home/tommy/anaconda3/envs/t2im/lib/python3.9/site-packages/torch/nn/modules/module.py: │
│ 1194 in _call_impl                                                                      │
│                                                                                         │
│   1191 │   │   # this function, and just call forward.                                  │
│   1192 │   │   if not (self._backward_hooks or self._forward_hooks or self._forward_pre │
│   1193 │   │   │   │   or _global_forward_hooks or _global_forward_pre_hooks):          │
│ ❱ 1194 │   │   │   return forward_call(*input, **kwargs)                                │
│   1195 │   │   # Do not call functions when jit is used                                 │
│   1196 │   │   full_backward_hooks, non_full_backward_hooks = [], []                    │
│   1197 │   │   if self._backward_hooks or _global_backward_hooks:                       │
│                                                                                         │
│ /mnt/workspace/stablediffusion/ldm/modules/attention.py:269 in forward                  │
│                                                                                         │
│   266 │   │   self.checkpoint = checkpoint                                              │
│   267 │                                                                                 │
│   268 │   def forward(self, x, context=None):                                           │
│ ❱ 269 │   │   return checkpoint(self._forward, (x, context), self.parameters(), self.ch │
│   270 │                                                                                 │
│   271 │   def _forward(self, x, context=None):                                          │
│   272 │   │   x = self.attn1(self.norm1(x), context=context if self.disable_self_attn e │
│                                                                                         │
│ /mnt/workspace/stablediffusion/ldm/modules/diffusionmodules/util.py:121 in checkpoint   │
│                                                                                         │
│   118 │   """                                                                           │
│   119 │   if flag:                                                                      │
│   120 │   │   args = tuple(inputs) + tuple(params)                                      │
│ ❱ 121 │   │   return CheckpointFunction.apply(func, len(inputs), *args)                 │
│   122 │   else:                                                                         │
│   123 │   │   return func(*inputs)                                                      │
│   124                                                                                   │
│                                                                                         │
│ /mnt/workspace/stablediffusion/ldm/modules/diffusionmodules/util.py:136 in forward      │
│                                                                                         │
│   133 │   │   │   │   │   │   │   │      "dtype": torch.get_autocast_gpu_dtype(),       │
│   134 │   │   │   │   │   │   │   │      "cache_enabled": torch.is_autocast_cache_enabl │
│   135 │   │   with torch.no_grad():                                                     │
│ ❱ 136 │   │   │   output_tensors = ctx.run_function(*ctx.input_tensors)                 │
│   137 │   │   return output_tensors                                                     │
│   138 │                                                                                 │
│   139 │   @staticmethod                                                                 │
│                                                                                         │
│ /mnt/workspace/stablediffusion/ldm/modules/attention.py:272 in _forward                 │
│                                                                                         │
│   269 │   │   return checkpoint(self._forward, (x, context), self.parameters(), self.ch │
│   270 │                                                                                 │
│   271 │   def _forward(self, x, context=None):                                          │
│ ❱ 272 │   │   x = self.attn1(self.norm1(x), context=context if self.disable_self_attn e │
│   273 │   │   x = self.attn2(self.norm2(x), context=context) + x                        │
│   274 │   │   x = self.ff(self.norm3(x)) + x                                            │
│   275 │   │   return x                                                                  │
│                                                                                         │
│ /home/tommy/anaconda3/envs/t2im/lib/python3.9/site-packages/torch/nn/modules/module.py: │
│ 1194 in _call_impl                                                                      │
│                                                                                         │
│   1191 │   │   # this function, and just call forward.                                  │
│   1192 │   │   if not (self._backward_hooks or self._forward_hooks or self._forward_pre │
│   1193 │   │   │   │   or _global_forward_hooks or _global_forward_pre_hooks):          │
│ ❱ 1194 │   │   │   return forward_call(*input, **kwargs)                                │
│   1195 │   │   # Do not call functions when jit is used                                 │
│   1196 │   │   full_backward_hooks, non_full_backward_hooks = [], []                    │
│   1197 │   │   if self._backward_hooks or _global_backward_hooks:                       │
│                                                                                         │
│ /mnt/workspace/stablediffusion/ldm/modules/attention.py:233 in forward                  │
│                                                                                         │
│   230 │   │   )                                                                         │
│   231 │   │                                                                             │
│   232 │   │   # actually compute the attention, what we cannot get enough of            │
│ ❱ 233 │   │   out = xformers.ops.memory_efficient_attention(q, k, v, attn_bias=None, op │
│   234 │   │                                                                             │
│   235 │   │   if exists(mask):                                                          │
│   236 │   │   │   raise NotImplementedError                                             │
│                                                                                         │
│ /mnt/workspace/xformers/xformers/ops/fmha/__init__.py:192 in memory_efficient_attention │
│                                                                                         │
│   189 │   │   and options.                                                              │
│   190 │   :return: multi-head attention Tensor with shape ``[B, Mq, H, Kv]``            │
│   191 │   """                                                                           │
│ ❱ 192 │   return _memory_efficient_attention(                                           │
│   193 │   │   Inputs(                                                                   │
│   194 │   │   │   query=query, key=key, value=value, p=p, attn_bias=attn_bias, scale=sc │
│   195 │   │   ),                                                                        │
│                                                                                         │
│ /mnt/workspace/xformers/xformers/ops/fmha/__init__.py:290 in                            │
│ _memory_efficient_attention                                                             │
│                                                                                         │
│   287 ) -> torch.Tensor:                                                                │
│   288 │   # fast-path that doesn't require computing the logsumexp for backward computa │
│   289 │   if all(x.requires_grad is False for x in [inp.query, inp.key, inp.value]):    │
│ ❱ 290 │   │   return _memory_efficient_attention_forward(                               │
│   291 │   │   │   inp, op=op[0] if op is not None else None                             │
│   292 │   │   )                                                                         │
│   293                                                                                   │
│                                                                                         │
│ /mnt/workspace/xformers/xformers/ops/fmha/__init__.py:306 in                            │
│ _memory_efficient_attention_forward                                                     │
│                                                                                         │
│   303 │   inp.validate_inputs()                                                         │
│   304 │   output_shape = inp.normalize_bmhk()                                           │
│   305 │   if op is None:                                                                │
│ ❱ 306 │   │   op = _dispatch_fw(inp)                                                    │
│   307 │   else:                                                                         │
│   308 │   │   _ensure_op_supports_or_raise(ValueError, "memory_efficient_attention", op │
│   309                                                                                   │
│                                                                                         │
│ /mnt/workspace/xformers/xformers/ops/fmha/dispatch.py:98 in _dispatch_fw                │
│                                                                                         │
│    95 │   if _is_triton_fwd_fastest(inp):                                               │
│    96 │   │   priority_list_ops.remove(triton.FwOp)                                     │
│    97 │   │   priority_list_ops.insert(0, triton.FwOp)                                  │
│ ❱  98 │   return _run_priority_list(                                                    │
│    99 │   │   "memory_efficient_attention_forward", priority_list_ops, inp              │
│   100 │   )                                                                             │
│   101                                                                                   │
│                                                                                         │
│ /mnt/workspace/xformers/xformers/ops/fmha/dispatch.py:73 in _run_priority_list          │
│                                                                                         │
│    70 {textwrap.indent(_format_inputs_description(inp), '     ')}"""                    │
│    71 │   for op, not_supported in zip(priority_list, not_supported_reasons):           │
│    72 │   │   msg += "\n" + _format_not_supported_reasons(op, not_supported)            │
│ ❱  73 │   raise NotImplementedError(msg)                                                │
│    74                                                                                   │
│    75                                                                                   │
│    76 def _dispatch_fw(inp: Inputs) -> Type[AttentionFwOpBase]:                         │
╰─────────────────────────────────────────────────────────────────────────────────────────╯
NotImplementedError: No operator found for `memory_efficient_attention_forward` with 
inputs:
     query       : shape=(30, 9216, 1, 64) (torch.float32)
     key         : shape=(30, 9216, 1, 64) (torch.float32)
     value       : shape=(30, 9216, 1, 64) (torch.float32)
     attn_bias   : <class 'NoneType'>
     p           : 0.0
`cutlassF` is not supported because:
    device=cpu (supported: {'cuda'})
`flshattF` is not supported because:
    device=cpu (supported: {'cuda'})
    dtype=torch.float32 (supported: {torch.float16, torch.bfloat16})
`tritonflashattF` is not supported because:
    device=cpu (supported: {'cuda'})
    dtype=torch.float32 (supported: {torch.float16, torch.bfloat16})
`smallkF` is not supported because:
    max(query.shape[-1] != value.shape[-1]) > 32
    unsupported embed per head: 64

Apr 18 '23 14:04 tommysnu

Could you try to remove --xformers flag?

Apr 18 '23 14:04 Tps-F