xformers NotImplementedError: Could not run xformers::efficient_attention_forward

NotImplementedError: Could not run xformers::efficient_attention_forward_cutlass

Open tikendraw opened this issue 1 year ago • 2 comments

I tried installing Stable Diffusion and Xformers . and ran into this

NotImplementedError: Could not run 'xformers::efficient_attention_forward_cutlass' with arguments from the 'CUDA' backend. This could be because the operator doesn't exist for this backend, or was omitted during the selective/custom build process (if using custom build). If you are a Facebook employee using PyTorch on mobile, please visit https://fburl.com/ptmfixes for possible resolutions. 'xformers::efficient_attention_forward_cutlass' is only available for these backends: [UNKNOWN_TENSOR_TYPE_ID, QuantizedXPU, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, SparseCPU, SparseCUDA, SparseHIP, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, SparseVE, UNKNOWN_TENSOR_TYPE_ID, NestedTensorCUDA, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID].

BackendSelect: fallthrough registered at /opt/conda/conda-bld/pytorch_1659484803030/work/aten/src/ATen/core/BackendSelectFallbackKernel.cpp:3 [backend fallback]
Python: registered at /opt/conda/conda-bld/pytorch_1659484803030/work/aten/src/ATen/core/PythonFallbackKernel.cpp:133 [backend fallback]
Named: registered at /opt/conda/conda-bld/pytorch_1659484803030/work/aten/src/ATen/core/NamedRegistrations.cpp:7 [backend fallback]
Conjugate: registered at /opt/conda/conda-bld/pytorch_1659484803030/work/aten/src/ATen/ConjugateFallback.cpp:18 [backend fallback]
Negative: registered at /opt/conda/conda-bld/pytorch_1659484803030/work/aten/src/ATen/native/NegateFallback.cpp:18 [backend fallback]
ZeroTensor: registered at /opt/conda/conda-bld/pytorch_1659484803030/work/aten/src/ATen/ZeroTensorFallback.cpp:86 [backend fallback]
ADInplaceOrView: fallthrough registered at /opt/conda/conda-bld/pytorch_1659484803030/work/aten/src/ATen/core/VariableFallbackKernel.cpp:64 [backend fallback]
AutogradOther: fallthrough registered at /opt/conda/conda-bld/pytorch_1659484803030/work/aten/src/ATen/core/VariableFallbackKernel.cpp:35 [backend fallback]
AutogradCPU: fallthrough registered at /opt/conda/conda-bld/pytorch_1659484803030/work/aten/src/ATen/core/VariableFallbackKernel.cpp:39 [backend fallback]
AutogradCUDA: fallthrough registered at /opt/conda/conda-bld/pytorch_1659484803030/work/aten/src/ATen/core/VariableFallbackKernel.cpp:47 [backend fallback]
AutogradXLA: fallthrough registered at /opt/conda/conda-bld/pytorch_1659484803030/work/aten/src/ATen/core/VariableFallbackKernel.cpp:51 [backend fallback]
AutogradMPS: fallthrough registered at /opt/conda/conda-bld/pytorch_1659484803030/work/aten/src/ATen/core/VariableFallbackKernel.cpp:59 [backend fallback]
AutogradXPU: fallthrough registered at /opt/conda/conda-bld/pytorch_1659484803030/work/aten/src/ATen/core/VariableFallbackKernel.cpp:43 [backend fallback]
AutogradHPU: fallthrough registered at /opt/conda/conda-bld/pytorch_1659484803030/work/aten/src/ATen/core/VariableFallbackKernel.cpp:68 [backend fallback]
AutogradLazy: fallthrough registered at /opt/conda/conda-bld/pytorch_1659484803030/work/aten/src/ATen/core/VariableFallbackKernel.cpp:55 [backend fallback]
Tracer: registered at /opt/conda/conda-bld/pytorch_1659484803030/work/torch/csrc/autograd/TraceTypeManual.cpp:295 [backend fallback]
AutocastCPU: fallthrough registered at /opt/conda/conda-bld/pytorch_1659484803030/work/aten/src/ATen/autocast_mode.cpp:481 [backend fallback]
Autocast: fallthrough registered at /opt/conda/conda-bld/pytorch_1659484803030/work/aten/src/ATen/autocast_mode.cpp:324 [backend fallback]
Batched: registered at /opt/conda/conda-bld/pytorch_1659484803030/work/aten/src/ATen/BatchingRegistrations.cpp:1064 [backend fallback]
VmapMode: fallthrough registered at /opt/conda/conda-bld/pytorch_1659484803030/work/aten/src/ATen/VmapModeRegistrations.cpp:33 [backend fallback]
Functionalize: registered at /opt/conda/conda-bld/pytorch_1659484803030/work/aten/src/ATen/FunctionalizeFallbackKernel.cpp:89 [backend fallback]
PythonTLSSnapshot: registered at /opt/conda/conda-bld/pytorch_1659484803030/work/aten/src/ATen/core/PythonFallbackKernel.cpp:137 [backend fallback]
Time taken: 4.12sTorch active/reserved: 2117/2130 MiB, Sys VRAM: 3046/3912 MiB (77.86%)

Help me to resolve this good people of the internet.

Oct 24 '22 03:10 tikendraw

Help me to resolve this good people of the internet. have you solve it ,i have a problem like this, in windows cuda 11.3 pytorch 1.10 and python 3.9 pip install https://github.com/neonsecret/xformers/releases/download/v0.14/xformers-0.0.14.dev0-cp39-cp39-win_amd64.whl
it's OK ? but when run ,have this error?

H:\Anaconda3\envs\sd_fast\python.exe H:/22.10.24Draw/stable-diffusion/optimizedSD/optimized_txt2img.py NOTE: Redirects are currently not supported in Windows or MacOs. Global seed set to 883403 Loading model from ../models/ldm/stable-diffusion-v1/model.ckpt Global Step: 470000 UNet: Running in eps-prediction mode CondStage: Running in eps-prediction mode FirstStage: Running in eps-prediction mode making attention of type 'vanilla' with 512 in_channels Working with z of shape (1, 4, 32, 32) = 4096 dimensions. making attention of type 'vanilla' with 512 in_channels Sampling: 0%| | 0/1 [00:00<?, ?it/s] data: 0%| | 0/1 [00:00<?, ?it/s] seeds used = [883403] Data shape for PLMS sampling is [1, 4, 64, 64] Running PLMS Sampling with 50 timesteps

PLMS Sampler: 0%| | 0/50 [00:00<?, ?it/s]WARNING:root:WARNING: [WinError 127] 找不到指定的程序。 [WinError 127] 找不到指定的程序。 Need to compile C++ extensions to get sparse attention suport. Please run python setup.py build develop PLMS Sampler: 0%| | 0/50 [00:01<?, ?it/s] data: 0%| | 0/1 [00:06<?, ?it/s] Sampling: 0%| | 0/1 [00:06<?, ?it/s] ┌───────────────────── Traceback (most recent call last) ─────────────────────┐ │ H:\22.10.24Draw\stable-diffusion\optimizedSD\optimized_txt2img.py:410 in │ │ │ │ │ │ 407 │ │ _modelCS.half() │ │ 408 │ │ _modelFS.half() │ │ 409 │ │ │ > 410 │ all_samples = get_image( │ │ 411 │ │ opt, │ │ 412 │ │ _model, │ │ 413 │ │ modelCS, │ │ │ │ H:\22.10.24Draw\stable-diffusion\optimizedSD\optimized_txt2img.py:153 in │ │ get_image │ │ │ │ 150 │ │ │ │ │ │ modelCS.to("cpu") │ │ 151 │ │ │ │ │ │ while torch.cuda.memory_allocated(device=opt. │ │ 152 │ │ │ │ │ │ │ time.sleep(1) │ │ > 153 │ │ │ │ │ samples_ddim = model.sample( │ │ 154 │ │ │ │ │ │ x0=(z_enc if opt.sampler == "ddim" else init │ │ 155 │ │ │ │ │ │ batch_size=batch_size, │ │ 156 │ │ │ │ │ │ S=opt.ddim_steps, │ │ │ │ H:\Anaconda3\envs\sd_fast\lib\site-packages\torch\autograd\grad_mode.py:27 │ │ in decorate_context │ │ │ │ 24 │ │ @functools.wraps(func) │ │ 25 │ │ def decorate_context(*args, **kwargs): │ │ 26 │ │ │ with self.clone(): │ │ > 27 │ │ │ │ return func(*args, **kwargs) │ │ 28 │ │ return cast(F, decorate_context) │ │ 29 │ │ │ 30 │ def wrap_generator(self, func): │ │ │ │ H:\22.10.24Draw\stable-diffusion\optimizedSD\ddpm.py:590 in sample │ │ │ │ 587 │ │ │ │ 588 │ │ if sampler == "plms": │ │ 589 │ │ │ print(f'Data shape for PLMS sampling is {shape}') │ │ > 590 │ │ │ samples = self.plms_sampling(conditioning, batch_size, x │ │ 591 │ │ │ │ │ │ │ │ │ │ callback=callback, │ │ 592 │ │ │ │ │ │ │ │ │ │ img_callback=img_callback, │ │ 593 │ │ │ │ │ │ │ │ │ │ quantize_denoised=quantize_x │ │ │ │ H:\Anaconda3\envs\sd_fast\lib\site-packages\torch\autograd\grad_mode.py:27 │ │ in decorate_context │ │ │ │ 24 │ │ @functools.wraps(func) │ │ 25 │ │ def decorate_context(*args, **kwargs): │ │ 26 │ │ │ with self.clone(): │ │ > 27 │ │ │ │ return func(*args, **kwargs) │ │ 28 │ │ return cast(F, decorate_context) │ │ 29 │ │ │ 30 │ def _wrap_generator(self, func): │ │ │ │ H:\22.10.24Draw\stable-diffusion\optimizedSD\ddpm.py:689 in plms_sampling │ │ │ │ 686 │ │ │ │ img = img_orig * mask + (1. - mask) * img │ │ 687 │ │ │ │ del img_orig │ │ 688 │ │ │ │ │ > 689 │ │ │ outs = self.p_sample_plms(img, cond, ts, index=index, use │ │ 690 │ │ │ │ │ │ │ │ │ quantize_denoised=quantize_deno │ │ 691 │ │ │ │ │ │ │ │ │ noise_dropout=noise_dropout, sc │ │ 692 │ │ │ │ │ │ │ │ │ corrector_kwargs=corrector_kwar │ │ │ │ H:\Anaconda3\envs\sd_fast\lib\site-packages\torch\autograd\grad_mode.py:27 │ │ in decorate_context │ │ │ │ 24 │ │ @functools.wraps(func) │ │ 25 │ │ def decorate_context(*args, **kwargs): │ │ 26 │ │ │ with self.clone(): │ │ > 27 │ │ │ │ return func(*args, **kwargs) │ │ 28 │ │ return cast(F, decorate_context) │ │ 29 │ │ │ 30 │ def wrap_generator(self, func): │ │ │ │ H:\22.10.24Draw\stable-diffusion\optimizedSD\ddpm.py:753 in p_sample_plms │ │ │ │ 750 │ │ │ x_prev = a_prev.sqrt() * pred_x0 + dir_xt + noise │ │ 751 │ │ │ return x_prev, pred_x0 │ │ 752 │ │ │ │ > 753 │ │ e_t = get_model_output(x, t, speed_mp=speed_mp) │ │ 754 │ │ if len(old_eps) == 0: │ │ 755 │ │ │ # Pseudo Improved Euler (2nd order) │ │ 756 │ │ │ x_prev, pred_x0 = get_x_prev_and_pred_x0(e_t, index) │ │ │ │ H:\22.10.24Draw\stable-diffusion\optimizedSD\ddpm.py:720 in │ │ get_model_output │ │ │ │ 717 │ │ │ │ x_in = torch.cat([x] * 2) │ │ 718 │ │ │ │ t_in = torch.cat([t] * 2) │ │ 719 │ │ │ │ c_in = torch.cat([unconditional_conditioning, c]) │ │ > 720 │ │ │ │ e_t_uncond, e_t = self.apply_model(x_in, t_in, c_in, │ │ 721 │ │ │ │ e_t = e_t_uncond + unconditional_guidance_scale * (e │ │ 722 │ │ │ │ │ 723 │ │ │ if score_corrector is not None: │ │ │ │ H:\22.10.24Draw\stable-diffusion\optimizedSD\ddpm.py:483 in apply_model │ │ │ │ 480 │ │ │ self.model1.to(self.cdevice) │ │ 481 │ │ │ │ 482 │ │ step = self.unet_bs │ │ > 483 │ │ h, emb, hs = self.model1(x_noisy[0:step], t[:step], cond[:ste │ │ 484 │ │ bs = cond.shape[0] │ │ 485 │ │ │ │ 486 │ │ # assert bs%2 == 0 │ │ │ │ H:\Anaconda3\envs\sd_fast\lib\site-packages\torch\nn\modules\module.py:1110 │ │ in _call_impl │ │ │ │ 1107 │ │ # this function, and just call forward. │ │ 1108 │ │ if not (self._backward_hooks or self._forward_hooks or self. │ │ 1109 │ │ │ │ or _global_forward_hooks or _global_forward_pre_hook │ │ > 1110 │ │ │ return forward_call(*input, **kwargs) │ │ 1111 │ │ # Do not call functions when jit is used │ │ 1112 │ │ full_backward_hooks, non_full_backward_hooks = [], [] │ │ 1113 │ │ if self._backward_hooks or _global_backward_hooks: │ │ │ │ H:\22.10.24Draw\stable-diffusion\optimizedSD\ddpm.py:323 in forward │ │ │ │ 320 │ │ self.diffusion_model = instantiate_from_config(diff_model_con │ │ 321 │ │ │ 322 │ def forward(self, x, t, cc, speed_mp): │ │ > 323 │ │ out = self.diffusion_model(x, t, context=cc, speed_mp=speed_m │ │ 324 │ │ return out │ │ 325 │ │ 326 │ │ │ │ H:\Anaconda3\envs\sd_fast\lib\site-packages\torch\nn\modules\module.py:1110 │ │ in _call_impl │ │ │ │ 1107 │ │ # this function, and just call forward. │ │ 1108 │ │ if not (self._backward_hooks or self._forward_hooks or self. │ │ 1109 │ │ │ │ or _global_forward_hooks or _global_forward_pre_hook │ │ > 1110 │ │ │ return forward_call(*input, **kwargs) │ │ 1111 │ │ # Do not call functions when jit is used │ │ 1112 │ │ full_backward_hooks, non_full_backward_hooks = [], [] │ │ 1113 │ │ if self._backward_hooks or _global_backward_hooks: │ │ │ │ H:\22.10.24Draw\stable-diffusion\optimizedSD\openaimodelSplit.py:612 in │ │ forward │ │ │ │ 609 │ │ │ │ 610 │ │ h = x.type(self.dtype) │ │ 611 │ │ for module in self.input_blocks: │ │ > 612 │ │ │ h = module(h, emb, context, speed_mp) │ │ 613 │ │ │ hs.append(h) │ │ 614 │ │ h = self.middle_block(h, emb, context, speed_mp) │ │ 615 │ │ │ │ H:\Anaconda3\envs\sd_fast\lib\site-packages\torch\nn\modules\module.py:1110 │ │ in _call_impl │ │ │ │ 1107 │ │ # this function, and just call forward. │ │ 1108 │ │ if not (self._backward_hooks or self._forward_hooks or self. │ │ 1109 │ │ │ │ or _global_forward_hooks or _global_forward_pre_hook │ │ > 1110 │ │ │ return forward_call(*input, **kwargs) │ │ 1111 │ │ # Do not call functions when jit is used │ │ 1112 │ │ full_backward_hooks, non_full_backward_hooks = [], [] │ │ 1113 │ │ if self._backward_hooks or _global_backward_hooks: │ │ │ │ H:\22.10.24Draw\stable-diffusion\optimizedSD\openaimodelSplit.py:73 in │ │ forward │ │ │ │ 70 │ │ │ if isinstance(layer, TimestepBlock): │ │ 71 │ │ │ │ x = layer(x, emb) │ │ 72 │ │ │ elif isinstance(layer, SpatialTransformer): │ │ > 73 │ │ │ │ x = layer(x, context, speed_mp) │ │ 74 │ │ │ else: │ │ 75 │ │ │ │ x = layer(x) │ │ 76 │ │ return x │ │ │ │ H:\Anaconda3\envs\sd_fast\lib\site-packages\torch\nn\modules\module.py:1110 │ │ in _call_impl │ │ │ │ 1107 │ │ # this function, and just call forward. │ │ 1108 │ │ if not (self._backward_hooks or self._forward_hooks or self. │ │ 1109 │ │ │ │ or _global_forward_hooks or _global_forward_pre_hook │ │ > 1110 │ │ │ return forward_call(*input, **kwargs) │ │ 1111 │ │ # Do not call functions when jit is used │ │ 1112 │ │ full_backward_hooks, non_full_backward_hooks = [], [] │ │ 1113 │ │ if self._backward_hooks or global_backward_hooks: │ │ │ │ H:\22.10.24Draw\stable-diffusion\ldm\modules\attention.py:349 in forward │ │ │ │ 346 │ │ x = self.proj_in(x) │ │ 347 │ │ x = rearrange(x, 'b c h w -> b (h w) c') │ │ 348 │ │ for block in self.transformer_blocks: │ │ > 349 │ │ │ x = block(x, speed_mp=speed_mp, context=context, fucking │ │ 350 │ │ x = rearrange(x, 'b (h w) c -> b c h w', h=h, w=w) │ │ 351 │ │ x = self.proj_out(x) │ │ 352 │ │ return x + x_in │ │ │ │ H:\Anaconda3\envs\sd_fast\lib\site-packages\torch\nn\modules\module.py:1110 │ │ in _call_impl │ │ │ │ 1107 │ │ # this function, and just call forward. │ │ 1108 │ │ if not (self._backward_hooks or self._forward_hooks or self. │ │ 1109 │ │ │ │ or _global_forward_hooks or _global_forward_pre_hook │ │ > 1110 │ │ │ return forward_call(*input, **kwargs) │ │ 1111 │ │ # Do not call functions when jit is used │ │ 1112 │ │ full_backward_hooks, non_full_backward_hooks = [], [] │ │ 1113 │ │ if self._backward_hooks or _global_backward_hooks: │ │ │ │ H:\22.10.24Draw\stable-diffusion\ldm\modules\attention.py:299 in forward │ │ │ │ 296 │ │ self.checkpoint = checkpoint │ │ 297 │ │ │ 298 │ def forward(self, x, speed_mp=None, context=None, fucking_hell=Fa │ │ > 299 │ │ return checkpoint(self._forward, (x, speed_mp, context, fucki │ │ 300 │ │ │ 301 │ def _forward(self, x, speed_mp=None, context=None, fucking_hell=F │ │ 302 │ │ x = self.attn1(self.norm1(x), speed_mp=speed_mp, dtype=x.dtyp │ │ │ │ H:\22.10.24Draw\stable-diffusion\ldm\modules\diffusionmodules\util.py:114 │ │ in checkpoint │ │ │ │ 111 │ """ │ │ 112 │ if flag: │ │ 113 │ │ args = tuple(inputs) + tuple(params) │ │ > 114 │ │ return CheckpointFunction.apply(func, len(inputs), *args) │ │ 115 │ else: │ │ 116 │ │ return func(*inputs) │ │ 117 │ │ │ │ H:\22.10.24Draw\stable-diffusion\ldm\modules\diffusionmodules\util.py:127 │ │ in forward │ │ │ │ 124 │ │ ctx.input_params = list(args[length:]) │ │ 125 │ │ │ │ 126 │ │ with torch.no_grad(): │ │ > 127 │ │ │ output_tensors = ctx.run_function(*ctx.input_tensors) │ │ 128 │ │ return output_tensors │ │ 129 │ │ │ 130 │ @staticmethod │ │ │ │ H:\22.10.24Draw\stable-diffusion\ldm\modules\attention.py:302 in _forward │ │ │ │ 299 │ │ return checkpoint(self._forward, (x, speed_mp, context, fucki │ │ 300 │ │ │ 301 │ def _forward(self, x, speed_mp=None, context=None, fucking_hell=F │ │ > 302 │ │ x = self.attn1(self.norm1(x), speed_mp=speed_mp, dtype=x.dtyp │ │ 303 │ │ x = self.attn2(self.norm2(x), speed_mp=speed_mp, context=cont │ │ 304 │ │ x = self.ff(self.norm3(x)) + x │ │ 305 │ │ return x │ │ │ │ H:\Anaconda3\envs\sd_fast\lib\site-packages\torch\nn\modules\module.py:1110 │ │ in _call_impl │ │ │ │ 1107 │ │ # this function, and just call forward. │ │ 1108 │ │ if not (self._backward_hooks or self._forward_hooks or self. │ │ 1109 │ │ │ │ or _global_forward_hooks or _global_forward_pre_hook │ │ > 1110 │ │ │ return forward_call(*input, **kwargs) │ │ 1111 │ │ # Do not call functions when jit is used │ │ 1112 │ │ full_backward_hooks, non_full_backward_hooks = [], [] │ │ 1113 │ │ if self._backward_hooks or _global_backward_hooks: │ │ │ │ H:\22.10.24Draw\stable-diffusion\ldm\modules\attention.py:234 in forward │ │ │ │ 231 │ │ │ 232 │ def forward(self, x, speed_mp=None, context=None, mask=None, dtyp │ │ 233 │ │ if speed_mp: │ │ > 234 │ │ │ return self.light_forward(x, context=context, mask=mask, │ │ 235 │ │ h = self.heads │ │ 236 │ │ device = x.device │ │ 237 │ │ secondary_device = device if (self.fast_forward and sys.platf │ │ │ │ H:\22.10.24Draw\stable-diffusion\ldm\modules\attention.py:221 in │ │ light_forward │ │ │ │ 218 │ │ self._maybe_init(q) │ │ 219 │ │ │ │ 220 │ │ # actually compute the attention, what we cannot get enough o │ │ > 221 │ │ out = xformers.ops.memory_efficient_attention(q, k, v, attn_b │ │ 222 │ │ del q, k, v │ │ 223 │ │ │ │ 224 │ │ out = ( │ │ │ │ H:\Anaconda3\envs\sd_fast\lib\site-packages\xformers\ops.py:778 in │ │ memory_efficient_attention │ │ │ │ 775 │ │ │ 776 │ # fast-path that doesn't require computing the logsumexp for back │ │ 777 │ if all(x.requires_grad is False for x in [query, key, value]): │ │ > 778 │ │ return op.forward_no_grad( │ │ 779 │ │ │ query=query, key=key, value=value, attn_bias=attn_bias, p │ │ 780 │ │ ).reshape(output_shape) │ │ 781 │ return op.apply(query, key, value, attn_bias, p).reshape(output_s │ │ │ │ H:\Anaconda3\envs\sd_fast\lib\site-packages\xformers\ops.py:304 in │ │ forward_no_grad │ │ │ │ 301 │ │ attn_bias: Optional[Union[torch.Tensor, AttentionMask]], │ │ 302 │ │ p: float, │ │ 303 │ ) -> torch.Tensor: │ │ > 304 │ │ return cls.FORWARD_OPERATOR( │ │ 305 │ │ │ query=query, │ │ 306 │ │ │ key=key, │ │ 307 │ │ │ value=value, │ │ │ │ H:\Anaconda3\envs\sd_fast\lib\site-packages\xformers\ops.py:45 in │ │ no_such_operator │ │ │ │ 42 │ │ 43 def _get_xformers_operator(name: str): │ │ 44 │ def no_such_operator(*args, **kwargs): │ │ > 45 │ │ raise RuntimeError( │ │ 46 │ │ │ f"No such operator xformers::{name} - did you forget to b │ │ 47 │ │ ) │ │ 48 │ └─────────────────────────────────────────────────────────────────────────────┘ RuntimeError: No such operator xformers::efficient_attention_forward_cutlass - did you forget to build xformers with python setup.py develop?

Process finished with exit code 1

Oct 24 '22 11:10 zhanghongyong123456

The build process of xformers isn't exactly smart. This workaround helped me: https://github.com/AUTOMATIC1111/stable-diffusion-webui/discussions/3525#discussioncomment-3965024

Oct 26 '22 19:10 f8upd8

Installing xformers==0.0.16 and cutlass from pip helped me. It's way better than building xformers

Feb 17 '23 11:02 joeyism

Yes, we updated install instructions with release of 0.0.16. Pip/conda is the recommended way to go. (note that you don't need to install cutlass) Closing as I think it is resolved

Feb 17 '23 11:02 danthe3rd

xformers xformers copied to clipboard

NotImplementedError: Could not run xformers::efficient_attention_forward_cutlass

xformers
xformers copied to clipboard