I tried installing Stable Diffusion and Xformers . and ran into this
NotImplementedError: Could not run 'xformers::efficient_attention_forward_cutlass' with arguments from the 'CUDA' backend. This could be because the operator doesn't exist for this backend, or was omitted during the selective/custom build process (if using custom build). If you are a Facebook employee using PyTorch on mobile, please visit https://fburl.com/ptmfixes for possible resolutions. 'xformers::efficient_attention_forward_cutlass' is only available for these backends: [UNKNOWN_TENSOR_TYPE_ID, QuantizedXPU, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, SparseCPU, SparseCUDA, SparseHIP, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, SparseVE, UNKNOWN_TENSOR_TYPE_ID, NestedTensorCUDA, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID].
BackendSelect: fallthrough registered at /opt/conda/conda-bld/pytorch_1659484803030/work/aten/src/ATen/core/BackendSelectFallbackKernel.cpp:3 [backend fallback]
Python: registered at /opt/conda/conda-bld/pytorch_1659484803030/work/aten/src/ATen/core/PythonFallbackKernel.cpp:133 [backend fallback]
Named: registered at /opt/conda/conda-bld/pytorch_1659484803030/work/aten/src/ATen/core/NamedRegistrations.cpp:7 [backend fallback]
Conjugate: registered at /opt/conda/conda-bld/pytorch_1659484803030/work/aten/src/ATen/ConjugateFallback.cpp:18 [backend fallback]
Negative: registered at /opt/conda/conda-bld/pytorch_1659484803030/work/aten/src/ATen/native/NegateFallback.cpp:18 [backend fallback]
ZeroTensor: registered at /opt/conda/conda-bld/pytorch_1659484803030/work/aten/src/ATen/ZeroTensorFallback.cpp:86 [backend fallback]
ADInplaceOrView: fallthrough registered at /opt/conda/conda-bld/pytorch_1659484803030/work/aten/src/ATen/core/VariableFallbackKernel.cpp:64 [backend fallback]
AutogradOther: fallthrough registered at /opt/conda/conda-bld/pytorch_1659484803030/work/aten/src/ATen/core/VariableFallbackKernel.cpp:35 [backend fallback]
AutogradCPU: fallthrough registered at /opt/conda/conda-bld/pytorch_1659484803030/work/aten/src/ATen/core/VariableFallbackKernel.cpp:39 [backend fallback]
AutogradCUDA: fallthrough registered at /opt/conda/conda-bld/pytorch_1659484803030/work/aten/src/ATen/core/VariableFallbackKernel.cpp:47 [backend fallback]
AutogradXLA: fallthrough registered at /opt/conda/conda-bld/pytorch_1659484803030/work/aten/src/ATen/core/VariableFallbackKernel.cpp:51 [backend fallback]
AutogradMPS: fallthrough registered at /opt/conda/conda-bld/pytorch_1659484803030/work/aten/src/ATen/core/VariableFallbackKernel.cpp:59 [backend fallback]
AutogradXPU: fallthrough registered at /opt/conda/conda-bld/pytorch_1659484803030/work/aten/src/ATen/core/VariableFallbackKernel.cpp:43 [backend fallback]
AutogradHPU: fallthrough registered at /opt/conda/conda-bld/pytorch_1659484803030/work/aten/src/ATen/core/VariableFallbackKernel.cpp:68 [backend fallback]
AutogradLazy: fallthrough registered at /opt/conda/conda-bld/pytorch_1659484803030/work/aten/src/ATen/core/VariableFallbackKernel.cpp:55 [backend fallback]
Tracer: registered at /opt/conda/conda-bld/pytorch_1659484803030/work/torch/csrc/autograd/TraceTypeManual.cpp:295 [backend fallback]
AutocastCPU: fallthrough registered at /opt/conda/conda-bld/pytorch_1659484803030/work/aten/src/ATen/autocast_mode.cpp:481 [backend fallback]
Autocast: fallthrough registered at /opt/conda/conda-bld/pytorch_1659484803030/work/aten/src/ATen/autocast_mode.cpp:324 [backend fallback]
Batched: registered at /opt/conda/conda-bld/pytorch_1659484803030/work/aten/src/ATen/BatchingRegistrations.cpp:1064 [backend fallback]
VmapMode: fallthrough registered at /opt/conda/conda-bld/pytorch_1659484803030/work/aten/src/ATen/VmapModeRegistrations.cpp:33 [backend fallback]
Functionalize: registered at /opt/conda/conda-bld/pytorch_1659484803030/work/aten/src/ATen/FunctionalizeFallbackKernel.cpp:89 [backend fallback]
PythonTLSSnapshot: registered at /opt/conda/conda-bld/pytorch_1659484803030/work/aten/src/ATen/core/PythonFallbackKernel.cpp:137 [backend fallback]
Time taken: 4.12sTorch active/reserved: 2117/2130 MiB, Sys VRAM: 3046/3912 MiB (77.86%)
Help me to resolve this good people of the internet.
Help me to resolve this good people of the internet.
have you solve it ,i have a problem like this,
in windows cuda 11.3 pytorch 1.10 and python 3.9 pip install https://github.com/neonsecret/xformers/releases/download/v0.14/xformers-0.0.14.dev0-cp39-cp39-win_amd64.whl
it's OK ? but when run ,have this error?
H:\Anaconda3\envs\sd_fast\python.exe H:/22.10.24Draw/stable-diffusion/optimizedSD/optimized_txt2img.py
NOTE: Redirects are currently not supported in Windows or MacOs.
Global seed set to 883403
Loading model from ../models/ldm/stable-diffusion-v1/model.ckpt
Global Step: 470000
UNet: Running in eps-prediction mode
CondStage: Running in eps-prediction mode
FirstStage: Running in eps-prediction mode
making attention of type 'vanilla' with 512 in_channels
Working with z of shape (1, 4, 32, 32) = 4096 dimensions.
making attention of type 'vanilla' with 512 in_channels
Sampling: 0%| | 0/1 [00:00<?, ?it/s]
data: 0%| | 0/1 [00:00<?, ?it/s]
seeds used = [883403]
Data shape for PLMS sampling is [1, 4, 64, 64]
Running PLMS Sampling with 50 timesteps
PLMS Sampler: 0%| | 0/50 [00:00<?, ?it/s]WARNING:root:WARNING: [WinError 127] 找不到指定的程序。
[WinError 127] 找不到指定的程序。
Need to compile C++ extensions to get sparse attention suport. Please run python setup.py build develop
PLMS Sampler: 0%| | 0/50 [00:01<?, ?it/s]
data: 0%| | 0/1 [00:06<?, ?it/s]
Sampling: 0%| | 0/1 [00:06<?, ?it/s]
┌───────────────────── Traceback (most recent call last) ─────────────────────┐
│ H:\22.10.24Draw\stable-diffusion\optimizedSD\optimized_txt2img.py:410 in │
│ │
│ │
│ 407 │ │ _modelCS.half() │
│ 408 │ │ _modelFS.half() │
│ 409 │ │
│ > 410 │ all_samples = get_image( │
│ 411 │ │ opt, │
│ 412 │ │ _model, │
│ 413 │ │ modelCS, │
│ │
│ H:\22.10.24Draw\stable-diffusion\optimizedSD\optimized_txt2img.py:153 in │
│ get_image │
│ │
│ 150 │ │ │ │ │ │ modelCS.to("cpu") │
│ 151 │ │ │ │ │ │ while torch.cuda.memory_allocated(device=opt. │
│ 152 │ │ │ │ │ │ │ time.sleep(1) │
│ > 153 │ │ │ │ │ samples_ddim = model.sample( │
│ 154 │ │ │ │ │ │ x0=(z_enc if opt.sampler == "ddim" else init │
│ 155 │ │ │ │ │ │ batch_size=batch_size, │
│ 156 │ │ │ │ │ │ S=opt.ddim_steps, │
│ │
│ H:\Anaconda3\envs\sd_fast\lib\site-packages\torch\autograd\grad_mode.py:27 │
│ in decorate_context │
│ │
│ 24 │ │ @functools.wraps(func) │
│ 25 │ │ def decorate_context(*args, **kwargs): │
│ 26 │ │ │ with self.clone(): │
│ > 27 │ │ │ │ return func(*args, **kwargs) │
│ 28 │ │ return cast(F, decorate_context) │
│ 29 │ │
│ 30 │ def wrap_generator(self, func): │
│ │
│ H:\22.10.24Draw\stable-diffusion\optimizedSD\ddpm.py:590 in sample │
│ │
│ 587 │ │ │
│ 588 │ │ if sampler == "plms": │
│ 589 │ │ │ print(f'Data shape for PLMS sampling is {shape}') │
│ > 590 │ │ │ samples = self.plms_sampling(conditioning, batch_size, x │
│ 591 │ │ │ │ │ │ │ │ │ │ callback=callback, │
│ 592 │ │ │ │ │ │ │ │ │ │ img_callback=img_callback, │
│ 593 │ │ │ │ │ │ │ │ │ │ quantize_denoised=quantize_x │
│ │
│ H:\Anaconda3\envs\sd_fast\lib\site-packages\torch\autograd\grad_mode.py:27 │
│ in decorate_context │
│ │
│ 24 │ │ @functools.wraps(func) │
│ 25 │ │ def decorate_context(*args, **kwargs): │
│ 26 │ │ │ with self.clone(): │
│ > 27 │ │ │ │ return func(*args, **kwargs) │
│ 28 │ │ return cast(F, decorate_context) │
│ 29 │ │
│ 30 │ def _wrap_generator(self, func): │
│ │
│ H:\22.10.24Draw\stable-diffusion\optimizedSD\ddpm.py:689 in plms_sampling │
│ │
│ 686 │ │ │ │ img = img_orig * mask + (1. - mask) * img │
│ 687 │ │ │ │ del img_orig │
│ 688 │ │ │ │
│ > 689 │ │ │ outs = self.p_sample_plms(img, cond, ts, index=index, use │
│ 690 │ │ │ │ │ │ │ │ │ quantize_denoised=quantize_deno │
│ 691 │ │ │ │ │ │ │ │ │ noise_dropout=noise_dropout, sc │
│ 692 │ │ │ │ │ │ │ │ │ corrector_kwargs=corrector_kwar │
│ │
│ H:\Anaconda3\envs\sd_fast\lib\site-packages\torch\autograd\grad_mode.py:27 │
│ in decorate_context │
│ │
│ 24 │ │ @functools.wraps(func) │
│ 25 │ │ def decorate_context(*args, **kwargs): │
│ 26 │ │ │ with self.clone(): │
│ > 27 │ │ │ │ return func(*args, **kwargs) │
│ 28 │ │ return cast(F, decorate_context) │
│ 29 │ │
│ 30 │ def wrap_generator(self, func): │
│ │
│ H:\22.10.24Draw\stable-diffusion\optimizedSD\ddpm.py:753 in p_sample_plms │
│ │
│ 750 │ │ │ x_prev = a_prev.sqrt() * pred_x0 + dir_xt + noise │
│ 751 │ │ │ return x_prev, pred_x0 │
│ 752 │ │ │
│ > 753 │ │ e_t = get_model_output(x, t, speed_mp=speed_mp) │
│ 754 │ │ if len(old_eps) == 0: │
│ 755 │ │ │ # Pseudo Improved Euler (2nd order) │
│ 756 │ │ │ x_prev, pred_x0 = get_x_prev_and_pred_x0(e_t, index) │
│ │
│ H:\22.10.24Draw\stable-diffusion\optimizedSD\ddpm.py:720 in │
│ get_model_output │
│ │
│ 717 │ │ │ │ x_in = torch.cat([x] * 2) │
│ 718 │ │ │ │ t_in = torch.cat([t] * 2) │
│ 719 │ │ │ │ c_in = torch.cat([unconditional_conditioning, c]) │
│ > 720 │ │ │ │ e_t_uncond, e_t = self.apply_model(x_in, t_in, c_in, │
│ 721 │ │ │ │ e_t = e_t_uncond + unconditional_guidance_scale * (e │
│ 722 │ │ │ │
│ 723 │ │ │ if score_corrector is not None: │
│ │
│ H:\22.10.24Draw\stable-diffusion\optimizedSD\ddpm.py:483 in apply_model │
│ │
│ 480 │ │ │ self.model1.to(self.cdevice) │
│ 481 │ │ │
│ 482 │ │ step = self.unet_bs │
│ > 483 │ │ h, emb, hs = self.model1(x_noisy[0:step], t[:step], cond[:ste │
│ 484 │ │ bs = cond.shape[0] │
│ 485 │ │ │
│ 486 │ │ # assert bs%2 == 0 │
│ │
│ H:\Anaconda3\envs\sd_fast\lib\site-packages\torch\nn\modules\module.py:1110 │
│ in _call_impl │
│ │
│ 1107 │ │ # this function, and just call forward. │
│ 1108 │ │ if not (self._backward_hooks or self._forward_hooks or self. │
│ 1109 │ │ │ │ or _global_forward_hooks or _global_forward_pre_hook │
│ > 1110 │ │ │ return forward_call(*input, **kwargs) │
│ 1111 │ │ # Do not call functions when jit is used │
│ 1112 │ │ full_backward_hooks, non_full_backward_hooks = [], [] │
│ 1113 │ │ if self._backward_hooks or _global_backward_hooks: │
│ │
│ H:\22.10.24Draw\stable-diffusion\optimizedSD\ddpm.py:323 in forward │
│ │
│ 320 │ │ self.diffusion_model = instantiate_from_config(diff_model_con │
│ 321 │ │
│ 322 │ def forward(self, x, t, cc, speed_mp): │
│ > 323 │ │ out = self.diffusion_model(x, t, context=cc, speed_mp=speed_m │
│ 324 │ │ return out │
│ 325 │
│ 326 │
│ │
│ H:\Anaconda3\envs\sd_fast\lib\site-packages\torch\nn\modules\module.py:1110 │
│ in _call_impl │
│ │
│ 1107 │ │ # this function, and just call forward. │
│ 1108 │ │ if not (self._backward_hooks or self._forward_hooks or self. │
│ 1109 │ │ │ │ or _global_forward_hooks or _global_forward_pre_hook │
│ > 1110 │ │ │ return forward_call(*input, **kwargs) │
│ 1111 │ │ # Do not call functions when jit is used │
│ 1112 │ │ full_backward_hooks, non_full_backward_hooks = [], [] │
│ 1113 │ │ if self._backward_hooks or _global_backward_hooks: │
│ │
│ H:\22.10.24Draw\stable-diffusion\optimizedSD\openaimodelSplit.py:612 in │
│ forward │
│ │
│ 609 │ │ │
│ 610 │ │ h = x.type(self.dtype) │
│ 611 │ │ for module in self.input_blocks: │
│ > 612 │ │ │ h = module(h, emb, context, speed_mp) │
│ 613 │ │ │ hs.append(h) │
│ 614 │ │ h = self.middle_block(h, emb, context, speed_mp) │
│ 615 │
│ │
│ H:\Anaconda3\envs\sd_fast\lib\site-packages\torch\nn\modules\module.py:1110 │
│ in _call_impl │
│ │
│ 1107 │ │ # this function, and just call forward. │
│ 1108 │ │ if not (self._backward_hooks or self._forward_hooks or self. │
│ 1109 │ │ │ │ or _global_forward_hooks or _global_forward_pre_hook │
│ > 1110 │ │ │ return forward_call(*input, **kwargs) │
│ 1111 │ │ # Do not call functions when jit is used │
│ 1112 │ │ full_backward_hooks, non_full_backward_hooks = [], [] │
│ 1113 │ │ if self._backward_hooks or _global_backward_hooks: │
│ │
│ H:\22.10.24Draw\stable-diffusion\optimizedSD\openaimodelSplit.py:73 in │
│ forward │
│ │
│ 70 │ │ │ if isinstance(layer, TimestepBlock): │
│ 71 │ │ │ │ x = layer(x, emb) │
│ 72 │ │ │ elif isinstance(layer, SpatialTransformer): │
│ > 73 │ │ │ │ x = layer(x, context, speed_mp) │
│ 74 │ │ │ else: │
│ 75 │ │ │ │ x = layer(x) │
│ 76 │ │ return x │
│ │
│ H:\Anaconda3\envs\sd_fast\lib\site-packages\torch\nn\modules\module.py:1110 │
│ in _call_impl │
│ │
│ 1107 │ │ # this function, and just call forward. │
│ 1108 │ │ if not (self._backward_hooks or self._forward_hooks or self. │
│ 1109 │ │ │ │ or _global_forward_hooks or _global_forward_pre_hook │
│ > 1110 │ │ │ return forward_call(*input, **kwargs) │
│ 1111 │ │ # Do not call functions when jit is used │
│ 1112 │ │ full_backward_hooks, non_full_backward_hooks = [], [] │
│ 1113 │ │ if self._backward_hooks or global_backward_hooks: │
│ │
│ H:\22.10.24Draw\stable-diffusion\ldm\modules\attention.py:349 in forward │
│ │
│ 346 │ │ x = self.proj_in(x) │
│ 347 │ │ x = rearrange(x, 'b c h w -> b (h w) c') │
│ 348 │ │ for block in self.transformer_blocks: │
│ > 349 │ │ │ x = block(x, speed_mp=speed_mp, context=context, fucking │
│ 350 │ │ x = rearrange(x, 'b (h w) c -> b c h w', h=h, w=w) │
│ 351 │ │ x = self.proj_out(x) │
│ 352 │ │ return x + x_in │
│ │
│ H:\Anaconda3\envs\sd_fast\lib\site-packages\torch\nn\modules\module.py:1110 │
│ in _call_impl │
│ │
│ 1107 │ │ # this function, and just call forward. │
│ 1108 │ │ if not (self._backward_hooks or self._forward_hooks or self. │
│ 1109 │ │ │ │ or _global_forward_hooks or _global_forward_pre_hook │
│ > 1110 │ │ │ return forward_call(*input, **kwargs) │
│ 1111 │ │ # Do not call functions when jit is used │
│ 1112 │ │ full_backward_hooks, non_full_backward_hooks = [], [] │
│ 1113 │ │ if self._backward_hooks or _global_backward_hooks: │
│ │
│ H:\22.10.24Draw\stable-diffusion\ldm\modules\attention.py:299 in forward │
│ │
│ 296 │ │ self.checkpoint = checkpoint │
│ 297 │ │
│ 298 │ def forward(self, x, speed_mp=None, context=None, fucking_hell=Fa │
│ > 299 │ │ return checkpoint(self._forward, (x, speed_mp, context, fucki │
│ 300 │ │
│ 301 │ def _forward(self, x, speed_mp=None, context=None, fucking_hell=F │
│ 302 │ │ x = self.attn1(self.norm1(x), speed_mp=speed_mp, dtype=x.dtyp │
│ │
│ H:\22.10.24Draw\stable-diffusion\ldm\modules\diffusionmodules\util.py:114 │
│ in checkpoint │
│ │
│ 111 │ """ │
│ 112 │ if flag: │
│ 113 │ │ args = tuple(inputs) + tuple(params) │
│ > 114 │ │ return CheckpointFunction.apply(func, len(inputs), *args) │
│ 115 │ else: │
│ 116 │ │ return func(*inputs) │
│ 117 │
│ │
│ H:\22.10.24Draw\stable-diffusion\ldm\modules\diffusionmodules\util.py:127 │
│ in forward │
│ │
│ 124 │ │ ctx.input_params = list(args[length:]) │
│ 125 │ │ │
│ 126 │ │ with torch.no_grad(): │
│ > 127 │ │ │ output_tensors = ctx.run_function(*ctx.input_tensors) │
│ 128 │ │ return output_tensors │
│ 129 │ │
│ 130 │ @staticmethod │
│ │
│ H:\22.10.24Draw\stable-diffusion\ldm\modules\attention.py:302 in _forward │
│ │
│ 299 │ │ return checkpoint(self._forward, (x, speed_mp, context, fucki │
│ 300 │ │
│ 301 │ def _forward(self, x, speed_mp=None, context=None, fucking_hell=F │
│ > 302 │ │ x = self.attn1(self.norm1(x), speed_mp=speed_mp, dtype=x.dtyp │
│ 303 │ │ x = self.attn2(self.norm2(x), speed_mp=speed_mp, context=cont │
│ 304 │ │ x = self.ff(self.norm3(x)) + x │
│ 305 │ │ return x │
│ │
│ H:\Anaconda3\envs\sd_fast\lib\site-packages\torch\nn\modules\module.py:1110 │
│ in _call_impl │
│ │
│ 1107 │ │ # this function, and just call forward. │
│ 1108 │ │ if not (self._backward_hooks or self._forward_hooks or self. │
│ 1109 │ │ │ │ or _global_forward_hooks or _global_forward_pre_hook │
│ > 1110 │ │ │ return forward_call(*input, **kwargs) │
│ 1111 │ │ # Do not call functions when jit is used │
│ 1112 │ │ full_backward_hooks, non_full_backward_hooks = [], [] │
│ 1113 │ │ if self._backward_hooks or _global_backward_hooks: │
│ │
│ H:\22.10.24Draw\stable-diffusion\ldm\modules\attention.py:234 in forward │
│ │
│ 231 │ │
│ 232 │ def forward(self, x, speed_mp=None, context=None, mask=None, dtyp │
│ 233 │ │ if speed_mp: │
│ > 234 │ │ │ return self.light_forward(x, context=context, mask=mask, │
│ 235 │ │ h = self.heads │
│ 236 │ │ device = x.device │
│ 237 │ │ secondary_device = device if (self.fast_forward and sys.platf │
│ │
│ H:\22.10.24Draw\stable-diffusion\ldm\modules\attention.py:221 in │
│ light_forward │
│ │
│ 218 │ │ self._maybe_init(q) │
│ 219 │ │ │
│ 220 │ │ # actually compute the attention, what we cannot get enough o │
│ > 221 │ │ out = xformers.ops.memory_efficient_attention(q, k, v, attn_b │
│ 222 │ │ del q, k, v │
│ 223 │ │ │
│ 224 │ │ out = ( │
│ │
│ H:\Anaconda3\envs\sd_fast\lib\site-packages\xformers\ops.py:778 in │
│ memory_efficient_attention │
│ │
│ 775 │ │
│ 776 │ # fast-path that doesn't require computing the logsumexp for back │
│ 777 │ if all(x.requires_grad is False for x in [query, key, value]): │
│ > 778 │ │ return op.forward_no_grad( │
│ 779 │ │ │ query=query, key=key, value=value, attn_bias=attn_bias, p │
│ 780 │ │ ).reshape(output_shape) │
│ 781 │ return op.apply(query, key, value, attn_bias, p).reshape(output_s │
│ │
│ H:\Anaconda3\envs\sd_fast\lib\site-packages\xformers\ops.py:304 in │
│ forward_no_grad │
│ │
│ 301 │ │ attn_bias: Optional[Union[torch.Tensor, AttentionMask]], │
│ 302 │ │ p: float, │
│ 303 │ ) -> torch.Tensor: │
│ > 304 │ │ return cls.FORWARD_OPERATOR( │
│ 305 │ │ │ query=query, │
│ 306 │ │ │ key=key, │
│ 307 │ │ │ value=value, │
│ │
│ H:\Anaconda3\envs\sd_fast\lib\site-packages\xformers\ops.py:45 in │
│ no_such_operator │
│ │
│ 42 │
│ 43 def _get_xformers_operator(name: str): │
│ 44 │ def no_such_operator(*args, **kwargs): │
│ > 45 │ │ raise RuntimeError( │
│ 46 │ │ │ f"No such operator xformers::{name} - did you forget to b │
│ 47 │ │ ) │
│ 48 │
└─────────────────────────────────────────────────────────────────────────────┘
RuntimeError: No such operator xformers::efficient_attention_forward_cutlass -
did you forget to build xformers with python setup.py develop
?
Process finished with exit code 1