RuntimeError: Input type (c10::Half) and bias type (float) should be the same
I am using M1 Pro MacBook and I am trying to develop a stablediffusion using mps.
I changed the part about cuda to mps and changed it from ddim.py to float32 because mps did not support float64.
def register_buffer(self, name, attr): if type(attr) == torch.Tensor: if attr.device != torch.device("mps"): attr = attr.to(torch.float32).to(torch.device("mps")) setattr(self, name, attr)
def make_schedule(self, ddim_num_steps, ddim_discretize="uniform", ddim_eta=0., verbose=True): self.ddim_timesteps = make_ddim_timesteps(ddim_discr_method=ddim_discretize, num_ddim_timesteps=ddim_num_steps, num_ddpm_timesteps=self.ddpm_num_timesteps,verbose=verbose) alphas_cumprod = self.model.alphas_cumprod assert alphas_cumprod.shape[0] == self.ddpm_num_timesteps, 'alphas have to be defined for each timestep' to_torch = lambda x: x.clone().detach().to(torch.float32).to(self.model.device)
Since then, this problem has occurred in conv.py
def _conv_forward(self, input: Tensor, weight: Tensor, bias: Optional[Tensor]): if self.padding_mode != 'zeros': return F.conv2d(F.pad(input, self._reversed_padding_repeated_twice, mode=self.padding_mode), weight, bias, self.stride, _pair(0), self.dilation, self.groups) return F.conv2d(input, weight, bias, self.stride, self.padding, self.dilation, self.groups)
def forward(self, input: Tensor) -> Tensor: return self._conv_forward(input, self.weight, self.bias)
Help me please.
Hey, how is it going? Did you figure it out?
No i didnt 😢
Well, that's too bad. I guess we're stuck then.
Are you having the same problem?
I was having that problem when I altered superresolution.py for my use case. You could try running the pipeline provided by diffusers though.. Since this issue got no response, I switched to running the text2img.py, and now, I am getting new error,
(base) root@stablediffusion$ python scripts/txt2img.py --prompt "a professional photograph ofan astronaut riding a horse" --ckpt v2-1_768-nonema-pruned.ckpt --config configs/stable-diffusion/v2-inference-v.yaml --H 768--W 768"
"RuntimeError: expected scalar type BFloat16 but found Float"
Wait, you're getting the problem from something else, not the superresolution.py script
Ok, I was fooling around, and got "RuntimeError: Input type (c10::Half) and bias type (float) should be the same" error again. It doesn't show up when you use "cuda". Why are you trying mps instead?
You can use this fork supported mps! https://github.com/Tps-F/stablediffusion
it's not work 🥺
this occurred error
Traceback (most recent call last):
File "/Users/blackcat/study/stablediffusion/scripts/txt2img.py", line 393, in
After days of troubleshooting, I was able to resolve this by upgrading tensorflow to 2.11.0 and editing the v2-inference.yaml file's parameter of use_fp16 to False
Try to use v2-inference-v-mac.yaml
@lakejee-rebel @Tps-F How long does it take to execute? It takes an hour to create an image on Tps-F's stable diffusion model
Here is!
https://github.com/Stability-AI/stablediffusion/pull/163#issuecomment-1422351441
@Tps-F It's faster because I reduced the batch size. Thank you. Are you interested in object detection like ssd(single shot multibox detector) or YOLO? I want trying ssd in m1 Mac but that model used to CUDA how to convert CUDA to MPS?
Shall I do it?
I'd appreciate it if you did that.
@Tps-F Can i follow you?
Sure! By the way, There seems to be more than one in ssd and YOLO, which one should I support?
Since we are not going to talk here, would you like to go to the discord or something?
Okay good what is your discord id? I will follow you
Thank you- Ftps#3389
@Tps-F Hi, I get a similar error using your fork:
.../venv/lib/python3.10/site-packages/torch/nn/modules/conv.py", line 459, in _conv_forward
return F.conv2d(input, weight, bias, self.stride,
RuntimeError: Input type (float) and bias type (c10::Half) should be the same
I used your v2-inference-v-mac.yaml as well, and updated tensorflow to 2.11.0 as suggested but it doesn't work...
Can I connect with you on Discord? I already sent a request... :)
I would like to see all the logs and what you have run. Can you show me?
Can I connect with you on Discord? I already sent a request... :)
Sure! But I might as well talk about it here in case anyone encounters a similar error in the future!
return F.conv2d(input, weight, bias, self.stride,
RuntimeError: Input type (c10::Half) and bias type (float) should be the same
I also got the above problem @yyahav . I am using ubuntu not macOS
@tommysnu Can you please share the entire stacktrace? I've made a change in the code which seems to work for me
Likewise, please share your logs with us so I can improve.
Likewise, please share your logs with us so I can improve.
Traceback (most recent call last):
File "/mnt/workspace/stablediffusion/scripts/txt2img.py", line 388, in <module>
main(opt)
File "/mnt/workspace/stablediffusion/scripts/txt2img.py", line 347, in main
samples, _ = sampler.sample(S=opt.steps,
File "/home/tommy/anaconda3/envs/t2im/lib/python3.9/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "/mnt/workspace/stablediffusion/ldm/models/diffusion/ddim.py", line 104, in sample
samples, intermediates = self.ddim_sampling(conditioning, size,
File "/home/tommy/anaconda3/envs/t2im/lib/python3.9/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "/mnt/workspace/stablediffusion/ldm/models/diffusion/ddim.py", line 164, in ddim_sampling
outs = self.p_sample_ddim(img, cond, ts, index=index, use_original_steps=ddim_use_original_steps,
File "/home/tommy/anaconda3/envs/t2im/lib/python3.9/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "/mnt/workspace/stablediffusion/ldm/models/diffusion/ddim.py", line 212, in p_sample_ddim
model_uncond, model_t = self.model.apply_model(x_in, t_in, c_in).chunk(2)
File "/mnt/workspace/stablediffusion/ldm/models/diffusion/ddpm.py", line 858, in apply_model
x_recon = self.model(x_noisy, t, **cond)
File "/home/tommy/anaconda3/envs/t2im/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/mnt/workspace/stablediffusion/ldm/models/diffusion/ddpm.py", line 1335, in forward
out = self.diffusion_model(x, t, context=cc)
File "/home/tommy/anaconda3/envs/t2im/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/mnt/workspace/stablediffusion/ldm/modules/diffusionmodules/openaimodel.py", line 797, in forward
h = module(h, emb, context)
File "/home/tommy/anaconda3/envs/t2im/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/mnt/workspace/stablediffusion/ldm/modules/diffusionmodules/openaimodel.py", line 86, in forward
x = layer(x)
File "/home/tommy/anaconda3/envs/t2im/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/home/tommy/anaconda3/envs/t2im/lib/python3.9/site-packages/torch/nn/modules/conv.py", line 463, in forward
return self._conv_forward(input, self.weight, self.bias)
File "/home/tommy/anaconda3/envs/t2im/lib/python3.9/site-packages/torch/nn/modules/conv.py", line 459, in _conv_forward
return F.conv2d(input, weight, bias, self.stride,
RuntimeError: Input type (c10::Half) and bias type (float) should be the same
This is my logs after I run:
python scripts/txt2img.py --prompt "a professional photograph of an astronaut riding a horse" --ckpt v2-1_768-ema-pruned.ckpt --config configs/stable-diffusion/v2-inference-v.yaml --H 768 --W 768
(Link: https://github.com/Stability-AI/stablediffusion#reference-sampling-script)
Could you give me any suggestion @yyahav and @Tps-F ? Thank you so much
I know you are using ubuntu, could you try using config for mac? https://github.com/Tps-F/stablediffusion/blob/mps-cpu-support/configs/stable-diffusion/mac/v2-inference-v-mac.yaml
I think the reason this happens is because you are using fp16
I know you are using ubuntu, could you try using config for mac? https://github.com/Tps-F/stablediffusion/blob/mps-cpu-support/configs/stable-diffusion/mac/v2-inference-v-mac.yaml
Thanks Tps-F. After using this config file I get other error as bellow:
Sampling: 0%| | 0/3 [00:00<?, ?it/sData shape for DDIM sampling is (3, 4, 96, 96), eta 0.0 | 0/1 [00:00<?, ?it/s]
Running DDIM Sampling with 50 timesteps
DDIM Sampler: 0%| | 0/50 [00:00<?, ?it/s]
data: 0%| | 0/1 [00:02<?, ?it/s]
Sampling: 0%| | 0/3 [00:02<?, ?it/s]
╭─────────────────────────── Traceback (most recent call last) ───────────────────────────╮
│ /mnt/workspace/stablediffusion/scripts/txt2img.py:388 in <module> │
│ │
│ 385 │
│ 386 if __name__ == "__main__": │
│ 387 │ opt = parse_args() │
│ ❱ 388 │ main(opt) │
│ 389 │
│ │
│ /mnt/workspace/stablediffusion/scripts/txt2img.py:347 in main │
│ │
│ 344 │ │ │ │ │ │ prompts = list(prompts) │
│ 345 │ │ │ │ │ c = model.get_learned_conditioning(prompts) │
│ 346 │ │ │ │ │ shape = [opt.C, opt.H // opt.f, opt.W // opt.f] │
│ ❱ 347 │ │ │ │ │ samples, _ = sampler.sample(S=opt.steps, │
│ 348 │ │ │ │ │ │ │ │ │ │ │ │ │ conditioning=c, │
│ 349 │ │ │ │ │ │ │ │ │ │ │ │ │ batch_size=opt.n_samples, │
│ 350 │ │ │ │ │ │ │ │ │ │ │ │ │ shape=shape, │
│ │
│ /home/tommy/anaconda3/envs/t2im/lib/python3.9/site-packages/torch/autograd/grad_mode.py │
│ :27 in decorate_context │
│ │
│ 24 │ │ @functools.wraps(func) │
│ 25 │ │ def decorate_context(*args, **kwargs): │
│ 26 │ │ │ with self.clone(): │
│ ❱ 27 │ │ │ │ return func(*args, **kwargs) │
│ 28 │ │ return cast(F, decorate_context) │
│ 29 │ │
│ 30 │ def _wrap_generator(self, func): │
│ │
│ /mnt/workspace/stablediffusion/ldm/models/diffusion/ddim.py:104 in sample │
│ │
│ 101 │ │ size = (batch_size, C, H, W) │
│ 102 │ │ print(f'Data shape for DDIM sampling is {size}, eta {eta}') │
│ 103 │ │ │
│ ❱ 104 │ │ samples, intermediates = self.ddim_sampling(conditioning, size, │
│ 105 │ │ │ │ │ │ │ │ │ │ │ │ │ callback=callback, │
│ 106 │ │ │ │ │ │ │ │ │ │ │ │ │ img_callback=img_callback, │
│ 107 │ │ │ │ │ │ │ │ │ │ │ │ │ quantize_denoised=quantize_x0 │
│ │
│ /home/tommy/anaconda3/envs/t2im/lib/python3.9/site-packages/torch/autograd/grad_mode.py │
│ :27 in decorate_context │
│ │
│ 24 │ │ @functools.wraps(func) │
│ 25 │ │ def decorate_context(*args, **kwargs): │
│ 26 │ │ │ with self.clone(): │
│ ❱ 27 │ │ │ │ return func(*args, **kwargs) │
│ 28 │ │ return cast(F, decorate_context) │
│ 29 │ │
│ 30 │ def _wrap_generator(self, func): │
│ │
│ /mnt/workspace/stablediffusion/ldm/models/diffusion/ddim.py:164 in ddim_sampling │
│ │
│ 161 │ │ │ │ assert len(ucg_schedule) == len(time_range) │
│ 162 │ │ │ │ unconditional_guidance_scale = ucg_schedule[i] │
│ 163 │ │ │ │
│ ❱ 164 │ │ │ outs = self.p_sample_ddim(img, cond, ts, index=index, use_original_st │
│ 165 │ │ │ │ │ │ │ │ │ quantize_denoised=quantize_denoised, temper │
│ 166 │ │ │ │ │ │ │ │ │ noise_dropout=noise_dropout, score_correcto │
│ 167 │ │ │ │ │ │ │ │ │ corrector_kwargs=corrector_kwargs, │
│ │
│ /home/tommy/anaconda3/envs/t2im/lib/python3.9/site-packages/torch/autograd/grad_mode.py │
│ :27 in decorate_context │
│ │
│ 24 │ │ @functools.wraps(func) │
│ 25 │ │ def decorate_context(*args, **kwargs): │
│ 26 │ │ │ with self.clone(): │
│ ❱ 27 │ │ │ │ return func(*args, **kwargs) │
│ 28 │ │ return cast(F, decorate_context) │
│ 29 │ │
│ 30 │ def _wrap_generator(self, func): │
│ │
│ /mnt/workspace/stablediffusion/ldm/models/diffusion/ddim.py:212 in p_sample_ddim │
│ │
│ 209 │ │ │ │ │ c_in.append(torch.cat([unconditional_conditioning[i], c[i]])) │
│ 210 │ │ │ else: │
│ 211 │ │ │ │ c_in = torch.cat([unconditional_conditioning, c]) │
│ ❱ 212 │ │ │ model_uncond, model_t = self.model.apply_model(x_in, t_in, c_in).chun │
│ 213 │ │ │ model_output = model_uncond + unconditional_guidance_scale * (model_t │
│ 214 │ │ │
│ 215 │ │ if self.model.parameterization == "v": │
│ │
│ /mnt/workspace/stablediffusion/ldm/models/diffusion/ddpm.py:858 in apply_model │
│ │
│ 855 │ │ │ key = 'c_concat' if self.model.conditioning_key == 'concat' else 'c_ │
│ 856 │ │ │ cond = {key: cond} │
│ 857 │ │ │
│ ❱ 858 │ │ x_recon = self.model(x_noisy, t, **cond) │
│ 859 │ │ │
│ 860 │ │ if isinstance(x_recon, tuple) and not return_ids: │
│ 861 │ │ │ return x_recon[0] │
│ │
│ /home/tommy/anaconda3/envs/t2im/lib/python3.9/site-packages/torch/nn/modules/module.py: │
│ 1194 in _call_impl │
│ │
│ 1191 │ │ # this function, and just call forward. │
│ 1192 │ │ if not (self._backward_hooks or self._forward_hooks or self._forward_pre │
│ 1193 │ │ │ │ or _global_forward_hooks or _global_forward_pre_hooks): │
│ ❱ 1194 │ │ │ return forward_call(*input, **kwargs) │
│ 1195 │ │ # Do not call functions when jit is used │
│ 1196 │ │ full_backward_hooks, non_full_backward_hooks = [], [] │
│ 1197 │ │ if self._backward_hooks or _global_backward_hooks: │
│ │
│ /mnt/workspace/stablediffusion/ldm/models/diffusion/ddpm.py:1335 in forward │
│ │
│ 1332 │ │ │ │ # an error: RuntimeError: forward() is missing value for argumen │
│ 1333 │ │ │ │ out = self.scripted_diffusion_model(x, t, cc) │
│ 1334 │ │ │ else: │
│ ❱ 1335 │ │ │ │ out = self.diffusion_model(x, t, context=cc) │
│ 1336 │ │ elif self.conditioning_key == 'hybrid': │
│ 1337 │ │ │ xc = torch.cat([x] + c_concat, dim=1) │
│ 1338 │ │ │ cc = torch.cat(c_crossattn, 1) │
│ │
│ /home/tommy/anaconda3/envs/t2im/lib/python3.9/site-packages/torch/nn/modules/module.py: │
│ 1194 in _call_impl │
│ │
│ 1191 │ │ # this function, and just call forward. │
│ 1192 │ │ if not (self._backward_hooks or self._forward_hooks or self._forward_pre │
│ 1193 │ │ │ │ or _global_forward_hooks or _global_forward_pre_hooks): │
│ ❱ 1194 │ │ │ return forward_call(*input, **kwargs) │
│ 1195 │ │ # Do not call functions when jit is used │
│ 1196 │ │ full_backward_hooks, non_full_backward_hooks = [], [] │
│ 1197 │ │ if self._backward_hooks or _global_backward_hooks: │
│ │
│ /mnt/workspace/stablediffusion/ldm/modules/diffusionmodules/openaimodel.py:797 in │
│ forward │
│ │
│ 794 │ │ │
│ 795 │ │ h = x.type(self.dtype) │
│ 796 │ │ for module in self.input_blocks: │
│ ❱ 797 │ │ │ h = module(h, emb, context) │
│ 798 │ │ │ hs.append(h) │
│ 799 │ │ h = self.middle_block(h, emb, context) │
│ 800 │ │ for module in self.output_blocks: │
│ │
│ /home/tommy/anaconda3/envs/t2im/lib/python3.9/site-packages/torch/nn/modules/module.py: │
│ 1194 in _call_impl │
│ │
│ 1191 │ │ # this function, and just call forward. │
│ 1192 │ │ if not (self._backward_hooks or self._forward_hooks or self._forward_pre │
│ 1193 │ │ │ │ or _global_forward_hooks or _global_forward_pre_hooks): │
│ ❱ 1194 │ │ │ return forward_call(*input, **kwargs) │
│ 1195 │ │ # Do not call functions when jit is used │
│ 1196 │ │ full_backward_hooks, non_full_backward_hooks = [], [] │
│ 1197 │ │ if self._backward_hooks or _global_backward_hooks: │
│ │
│ /mnt/workspace/stablediffusion/ldm/modules/diffusionmodules/openaimodel.py:84 in │
│ forward │
│ │
│ 81 │ │ │ if isinstance(layer, TimestepBlock): │
│ 82 │ │ │ │ x = layer(x, emb) │
│ 83 │ │ │ elif isinstance(layer, SpatialTransformer): │
│ ❱ 84 │ │ │ │ x = layer(x, context) │
│ 85 │ │ │ else: │
│ 86 │ │ │ │ x = layer(x) │
│ 87 │ │ return x │
│ │
│ /home/tommy/anaconda3/envs/t2im/lib/python3.9/site-packages/torch/nn/modules/module.py: │
│ 1194 in _call_impl │
│ │
│ 1191 │ │ # this function, and just call forward. │
│ 1192 │ │ if not (self._backward_hooks or self._forward_hooks or self._forward_pre │
│ 1193 │ │ │ │ or _global_forward_hooks or _global_forward_pre_hooks): │
│ ❱ 1194 │ │ │ return forward_call(*input, **kwargs) │
│ 1195 │ │ # Do not call functions when jit is used │
│ 1196 │ │ full_backward_hooks, non_full_backward_hooks = [], [] │
│ 1197 │ │ if self._backward_hooks or _global_backward_hooks: │
│ │
│ /mnt/workspace/stablediffusion/ldm/modules/attention.py:334 in forward │
│ │
│ 331 │ │ if self.use_linear: │
│ 332 │ │ │ x = self.proj_in(x) │
│ 333 │ │ for i, block in enumerate(self.transformer_blocks): │
│ ❱ 334 │ │ │ x = block(x, context=context[i]) │
│ 335 │ │ if self.use_linear: │
│ 336 │ │ │ x = self.proj_out(x) │
│ 337 │ │ x = rearrange(x, 'b (h w) c -> b c h w', h=h, w=w).contiguous() │
│ │
│ /home/tommy/anaconda3/envs/t2im/lib/python3.9/site-packages/torch/nn/modules/module.py: │
│ 1194 in _call_impl │
│ │
│ 1191 │ │ # this function, and just call forward. │
│ 1192 │ │ if not (self._backward_hooks or self._forward_hooks or self._forward_pre │
│ 1193 │ │ │ │ or _global_forward_hooks or _global_forward_pre_hooks): │
│ ❱ 1194 │ │ │ return forward_call(*input, **kwargs) │
│ 1195 │ │ # Do not call functions when jit is used │
│ 1196 │ │ full_backward_hooks, non_full_backward_hooks = [], [] │
│ 1197 │ │ if self._backward_hooks or _global_backward_hooks: │
│ │
│ /mnt/workspace/stablediffusion/ldm/modules/attention.py:269 in forward │
│ │
│ 266 │ │ self.checkpoint = checkpoint │
│ 267 │ │
│ 268 │ def forward(self, x, context=None): │
│ ❱ 269 │ │ return checkpoint(self._forward, (x, context), self.parameters(), self.ch │
│ 270 │ │
│ 271 │ def _forward(self, x, context=None): │
│ 272 │ │ x = self.attn1(self.norm1(x), context=context if self.disable_self_attn e │
│ │
│ /mnt/workspace/stablediffusion/ldm/modules/diffusionmodules/util.py:121 in checkpoint │
│ │
│ 118 │ """ │
│ 119 │ if flag: │
│ 120 │ │ args = tuple(inputs) + tuple(params) │
│ ❱ 121 │ │ return CheckpointFunction.apply(func, len(inputs), *args) │
│ 122 │ else: │
│ 123 │ │ return func(*inputs) │
│ 124 │
│ │
│ /mnt/workspace/stablediffusion/ldm/modules/diffusionmodules/util.py:136 in forward │
│ │
│ 133 │ │ │ │ │ │ │ │ "dtype": torch.get_autocast_gpu_dtype(), │
│ 134 │ │ │ │ │ │ │ │ "cache_enabled": torch.is_autocast_cache_enabl │
│ 135 │ │ with torch.no_grad(): │
│ ❱ 136 │ │ │ output_tensors = ctx.run_function(*ctx.input_tensors) │
│ 137 │ │ return output_tensors │
│ 138 │ │
│ 139 │ @staticmethod │
│ │
│ /mnt/workspace/stablediffusion/ldm/modules/attention.py:272 in _forward │
│ │
│ 269 │ │ return checkpoint(self._forward, (x, context), self.parameters(), self.ch │
│ 270 │ │
│ 271 │ def _forward(self, x, context=None): │
│ ❱ 272 │ │ x = self.attn1(self.norm1(x), context=context if self.disable_self_attn e │
│ 273 │ │ x = self.attn2(self.norm2(x), context=context) + x │
│ 274 │ │ x = self.ff(self.norm3(x)) + x │
│ 275 │ │ return x │
│ │
│ /home/tommy/anaconda3/envs/t2im/lib/python3.9/site-packages/torch/nn/modules/module.py: │
│ 1194 in _call_impl │
│ │
│ 1191 │ │ # this function, and just call forward. │
│ 1192 │ │ if not (self._backward_hooks or self._forward_hooks or self._forward_pre │
│ 1193 │ │ │ │ or _global_forward_hooks or _global_forward_pre_hooks): │
│ ❱ 1194 │ │ │ return forward_call(*input, **kwargs) │
│ 1195 │ │ # Do not call functions when jit is used │
│ 1196 │ │ full_backward_hooks, non_full_backward_hooks = [], [] │
│ 1197 │ │ if self._backward_hooks or _global_backward_hooks: │
│ │
│ /mnt/workspace/stablediffusion/ldm/modules/attention.py:233 in forward │
│ │
│ 230 │ │ ) │
│ 231 │ │ │
│ 232 │ │ # actually compute the attention, what we cannot get enough of │
│ ❱ 233 │ │ out = xformers.ops.memory_efficient_attention(q, k, v, attn_bias=None, op │
│ 234 │ │ │
│ 235 │ │ if exists(mask): │
│ 236 │ │ │ raise NotImplementedError │
│ │
│ /mnt/workspace/xformers/xformers/ops/fmha/__init__.py:192 in memory_efficient_attention │
│ │
│ 189 │ │ and options. │
│ 190 │ :return: multi-head attention Tensor with shape ``[B, Mq, H, Kv]`` │
│ 191 │ """ │
│ ❱ 192 │ return _memory_efficient_attention( │
│ 193 │ │ Inputs( │
│ 194 │ │ │ query=query, key=key, value=value, p=p, attn_bias=attn_bias, scale=sc │
│ 195 │ │ ), │
│ │
│ /mnt/workspace/xformers/xformers/ops/fmha/__init__.py:290 in │
│ _memory_efficient_attention │
│ │
│ 287 ) -> torch.Tensor: │
│ 288 │ # fast-path that doesn't require computing the logsumexp for backward computa │
│ 289 │ if all(x.requires_grad is False for x in [inp.query, inp.key, inp.value]): │
│ ❱ 290 │ │ return _memory_efficient_attention_forward( │
│ 291 │ │ │ inp, op=op[0] if op is not None else None │
│ 292 │ │ ) │
│ 293 │
│ │
│ /mnt/workspace/xformers/xformers/ops/fmha/__init__.py:306 in │
│ _memory_efficient_attention_forward │
│ │
│ 303 │ inp.validate_inputs() │
│ 304 │ output_shape = inp.normalize_bmhk() │
│ 305 │ if op is None: │
│ ❱ 306 │ │ op = _dispatch_fw(inp) │
│ 307 │ else: │
│ 308 │ │ _ensure_op_supports_or_raise(ValueError, "memory_efficient_attention", op │
│ 309 │
│ │
│ /mnt/workspace/xformers/xformers/ops/fmha/dispatch.py:98 in _dispatch_fw │
│ │
│ 95 │ if _is_triton_fwd_fastest(inp): │
│ 96 │ │ priority_list_ops.remove(triton.FwOp) │
│ 97 │ │ priority_list_ops.insert(0, triton.FwOp) │
│ ❱ 98 │ return _run_priority_list( │
│ 99 │ │ "memory_efficient_attention_forward", priority_list_ops, inp │
│ 100 │ ) │
│ 101 │
│ │
│ /mnt/workspace/xformers/xformers/ops/fmha/dispatch.py:73 in _run_priority_list │
│ │
│ 70 {textwrap.indent(_format_inputs_description(inp), ' ')}""" │
│ 71 │ for op, not_supported in zip(priority_list, not_supported_reasons): │
│ 72 │ │ msg += "\n" + _format_not_supported_reasons(op, not_supported) │
│ ❱ 73 │ raise NotImplementedError(msg) │
│ 74 │
│ 75 │
│ 76 def _dispatch_fw(inp: Inputs) -> Type[AttentionFwOpBase]: │
╰─────────────────────────────────────────────────────────────────────────────────────────╯
NotImplementedError: No operator found for `memory_efficient_attention_forward` with
inputs:
query : shape=(30, 9216, 1, 64) (torch.float32)
key : shape=(30, 9216, 1, 64) (torch.float32)
value : shape=(30, 9216, 1, 64) (torch.float32)
attn_bias : <class 'NoneType'>
p : 0.0
`cutlassF` is not supported because:
device=cpu (supported: {'cuda'})
`flshattF` is not supported because:
device=cpu (supported: {'cuda'})
dtype=torch.float32 (supported: {torch.float16, torch.bfloat16})
`tritonflashattF` is not supported because:
device=cpu (supported: {'cuda'})
dtype=torch.float32 (supported: {torch.float16, torch.bfloat16})
`smallkF` is not supported because:
max(query.shape[-1] != value.shape[-1]) > 32
unsupported embed per head: 64
Could you try to remove --xformers flag?