stable-diffusion-webui Could not run xformers::efficient_attention_forward

venv "C:\Users\GEN32UC\stable-diffusion-webui\venv\Scripts\Python.exe"
Python 3.8.10 (tags/v3.8.10:3d8993a, May  3 2021, 11:48:03) [MSC v.1928 64 bit (AMD64)]
Commit hash: cbf6dad02d04d98e5a2d5e870777ab99b5796b2d
Installing requirements for Web UI
Launching Web UI with arguments: --listen --always-batch-cond-uncond --precision full --no-half --opt-split-attention --force-enable-xformers
LatentDiffusion: Running in eps-prediction mode
DiffusionWrapper has 859.52 M params.
making attention of type 'vanilla' with 512 in_channels
Working with z of shape (1, 4, 32, 32) = 4096 dimensions.
making attention of type 'vanilla' with 512 in_channels
Loading weights [7460a6fa] from C:\Users\GEN32UC\stable-diffusion-webui\models\Stable-diffusion\model.ckpt
Global Step: 470000
Applying xformers cross attention optimization.
Model loaded.
Loading hypernetwork None
Loaded a total of 6 textual inversion embeddings.
Running on local URL:  http://0.0.0.0:7860

To create a public link, set `share=True` in `launch()`.
  0%|                                                                                           | 0/20 [00:01<?, ?it/s]
Error completing request
Arguments: ('cat', '', 'None', 'None', 20, 0, False, False, 1, 1, 7, -1.0, -1.0, 0, 0, 0, False, 512, 512, False, False, 0.7, 0, False, False, None, '', 1, '', 4, '', True, False) {}
Traceback (most recent call last):
  File "C:\Users\GEN32UC\stable-diffusion-webui\modules\ui.py", line 176, in f
    res = list(func(*args, **kwargs))
  File "C:\Users\GEN32UC\stable-diffusion-webui\webui.py", line 68, in f
    res = func(*args, **kwargs)
  File "C:\Users\GEN32UC\stable-diffusion-webui\modules\txt2img.py", line 43, in txt2img
    processed = process_images(p)
  File "C:\Users\GEN32UC\stable-diffusion-webui\modules\processing.py", line 391, in process_images
    samples_ddim = p.sample(conditioning=c, unconditional_conditioning=uc, seeds=seeds, subseeds=subseeds, subseed_strength=p.subseed_strength)
  File "C:\Users\GEN32UC\stable-diffusion-webui\modules\processing.py", line 518, in sample
    samples = self.sampler.sample(self, x, conditioning, unconditional_conditioning)
  File "C:\Users\GEN32UC\stable-diffusion-webui\modules\sd_samplers.py", line 399, in sample
    samples = self.func(self.model_wrap_cfg, x, extra_args={'cond': conditioning, 'uncond': unconditional_conditioning, 'cond_scale': p.cfg_scale}, disable=False, callback=self.callback_state, **extra_params_kwargs)
  File "C:\Users\GEN32UC\stable-diffusion-webui\venv\lib\site-packages\torch\autograd\grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "C:\Users\GEN32UC\stable-diffusion-webui\repositories\k-diffusion\k_diffusion\sampling.py", line 80, in sample_euler_ancestral
    denoised = model(x, sigmas[i] * s_in, **extra_args)
  File "C:\Users\GEN32UC\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "C:\Users\GEN32UC\stable-diffusion-webui\modules\sd_samplers.py", line 239, in forward
    x_out = self.inner_model(x_in, sigma_in, cond=cond_in)
  File "C:\Users\GEN32UC\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "C:\Users\GEN32UC\stable-diffusion-webui\repositories\k-diffusion\k_diffusion\external.py", line 112, in forward
    eps = self.get_eps(input * c_in, self.sigma_to_t(sigma), **kwargs)
  File "C:\Users\GEN32UC\stable-diffusion-webui\repositories\k-diffusion\k_diffusion\external.py", line 138, in get_eps
    return self.inner_model.apply_model(*args, **kwargs)
  File "C:\Users\GEN32UC\stable-diffusion-webui\repositories\stable-diffusion\ldm\models\diffusion\ddpm.py", line 987, in apply_model
    x_recon = self.model(x_noisy, t, **cond)
  File "C:\Users\GEN32UC\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "C:\Users\GEN32UC\stable-diffusion-webui\repositories\stable-diffusion\ldm\models\diffusion\ddpm.py", line 1410, in forward
    out = self.diffusion_model(x, t, context=cc)
  File "C:\Users\GEN32UC\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "C:\Users\GEN32UC\stable-diffusion-webui\repositories\stable-diffusion\ldm\modules\diffusionmodules\openaimodel.py", line 732, in forward
    h = module(h, emb, context)
  File "C:\Users\GEN32UC\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "C:\Users\GEN32UC\stable-diffusion-webui\repositories\stable-diffusion\ldm\modules\diffusionmodules\openaimodel.py", line 85, in forward
    x = layer(x, context)
  File "C:\Users\GEN32UC\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "C:\Users\GEN32UC\stable-diffusion-webui\repositories\stable-diffusion\ldm\modules\attention.py", line 258, in forward
    x = block(x, context=context)
  File "C:\Users\GEN32UC\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "C:\Users\GEN32UC\stable-diffusion-webui\repositories\stable-diffusion\ldm\modules\attention.py", line 209, in forward
    return checkpoint(self._forward, (x, context), self.parameters(), self.checkpoint)
  File "C:\Users\GEN32UC\stable-diffusion-webui\repositories\stable-diffusion\ldm\modules\diffusionmodules\util.py", line 114, in checkpoint
    return CheckpointFunction.apply(func, len(inputs), *args)
  File "C:\Users\GEN32UC\stable-diffusion-webui\repositories\stable-diffusion\ldm\modules\diffusionmodules\util.py", line 127, in forward
    output_tensors = ctx.run_function(*ctx.input_tensors)
  File "C:\Users\GEN32UC\stable-diffusion-webui\repositories\stable-diffusion\ldm\modules\attention.py", line 212, in _forward
    x = self.attn1(self.norm1(x)) + x
  File "C:\Users\GEN32UC\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "C:\Users\GEN32UC\stable-diffusion-webui\modules\sd_hijack_optimizations.py", line 145, in xformers_attention_forward
    out = xformers.ops.memory_efficient_attention(q, k, v, attn_bias=None)
  File "c:\users\gen32uc\stable-diffusion-webui\repositories\xformers\xformers\ops.py", line 862, in memory_efficient_attention
    return op.forward_no_grad(
  File "c:\users\gen32uc\stable-diffusion-webui\repositories\xformers\xformers\ops.py", line 305, in forward_no_grad
    return cls.FORWARD_OPERATOR(
  File "C:\Users\GEN32UC\stable-diffusion-webui\venv\lib\site-packages\torch\_ops.py", line 143, in __call__
    return self._op(*args, **kwargs or {})
NotImplementedError: Could not run 'xformers::efficient_attention_forward_cutlass' with arguments from the 'CUDA' backend. This could be because the operator doesn't exist for this backend, or was omitted during the selective/custom build process (if using custom build). If you are a Facebook employee using PyTorch on mobile, please visit https://fburl.com/ptmfixes for possible resolutions. 'xformers::efficient_attention_forward_cutlass' is only available for these backends: [UNKNOWN_TENSOR_TYPE_ID, QuantizedXPU, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, SparseCPU, SparseCUDA, SparseHIP, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, SparseVE, UNKNOWN_TENSOR_TYPE_ID, NestedTensorCUDA, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID].

BackendSelect: fallthrough registered at C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\core\BackendSelectFallbackKernel.cpp:3 [backend fallback]
Python: registered at C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\core\PythonFallbackKernel.cpp:133 [backend fallback]
Named: registered at C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\core\NamedRegistrations.cpp:7 [backend fallback]
Conjugate: registered at C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\ConjugateFallback.cpp:18 [backend fallback]
Negative: registered at C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\NegateFallback.cpp:18 [backend fallback]
ZeroTensor: registered at C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\ZeroTensorFallback.cpp:86 [backend fallback]
FuncTorchDynamicLayerBackMode: registered at C:\Users\circleci\project\functorch\csrc\DynamicLayer.cpp:487 [backend fallback]
ADInplaceOrView: fallthrough registered at C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\core\VariableFallbackKernel.cpp:64 [backend fallback]
AutogradOther: fallthrough registered at C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\core\VariableFallbackKernel.cpp:35 [backend fallback]
AutogradCPU: fallthrough registered at C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\core\VariableFallbackKernel.cpp:39 [backend fallback]
AutogradCUDA: fallthrough registered at C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\core\VariableFallbackKernel.cpp:47 [backend fallback]
AutogradXLA: fallthrough registered at C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\core\VariableFallbackKernel.cpp:51 [backend fallback]
AutogradMPS: fallthrough registered at C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\core\VariableFallbackKernel.cpp:59 [backend fallback]
AutogradXPU: fallthrough registered at C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\core\VariableFallbackKernel.cpp:43 [backend fallback]
AutogradHPU: fallthrough registered at C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\core\VariableFallbackKernel.cpp:68 [backend fallback]
AutogradLazy: fallthrough registered at C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\core\VariableFallbackKernel.cpp:55 [backend fallback]
Tracer: registered at C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\torch\csrc\autograd\TraceTypeManual.cpp:295 [backend fallback]
AutocastCPU: fallthrough registered at C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\autocast_mode.cpp:481 [backend fallback]
Autocast: fallthrough registered at C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\autocast_mode.cpp:324 [backend fallback]
FuncTorchBatched: registered at C:\Users\circleci\project\functorch\csrc\LegacyBatchingRegistrations.cpp:661 [backend fallback]
FuncTorchVmapMode: fallthrough registered at C:\Users\circleci\project\functorch\csrc\VmapModeRegistrations.cpp:24 [backend fallback]
Batched: registered at C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\BatchingRegistrations.cpp:1064 [backend fallback]
VmapMode: fallthrough registered at C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\VmapModeRegistrations.cpp:33 [backend fallback]
FuncTorchGradWrapper: registered at C:\Users\circleci\project\functorch\csrc\TensorWrapper.cpp:187 [backend fallback]
Functionalize: registered at C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\FunctionalizeFallbackKernel.cpp:89 [backend fallback]
PythonTLSSnapshot: registered at C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\core\PythonFallbackKernel.cpp:137 [backend fallback]
FuncTorchDynamicLayerFrontMode: registered at C:\Users\circleci\project\functorch\csrc\DynamicLayer.cpp:483 [backend fallback]

today i decided to try xformer, after many failed install, after all, it successful installed. When i press generate, it just have above error CUDA lastest, before xformers installed and run with command, everything just work normal.

Oct 09 '22 12:10 wankio

do you have Cutlass installed?

conda install cutlass or

pip install cutlass

either you can try and install cutlass,

or you can uninstall xformers

pip uninstall xformers

Oct 09 '22 17:10 Thomas-MMJ

well i just delete xformers folder and recomplie, with torch_cuda_arch_list, it worked now i think keep install it on exist folder(even it dont have anything inside caused the problem)

Oct 10 '22 04:10 wankio

If anyone is having trouble with that in Docker, that helped me: (change TORCH_CUDA_ARCH_LIST to your value). 8.6 is for RTX 3060

RUN git clone https://github.com/facebookresearch/xformers/ repositories/xformers && cd repositories/xformers && git submodule update --init --recursive

RUN apt install -y g++
RUN cd repositories/xformers && \
    export FORCE_CUDA="1" && \
    export TORCH_CUDA_ARCH_LIST=8.6 && \
    CUDA_VISIBLE_DEVICES=0 pip install --verbose --no-deps -e .

Oct 11 '22 00:10 luckyycode

If anyone is having trouble with that in Docker, that helped me: (change TORCH_CUDA_ARCH_LIST to your value). 8.6 is for RTX 3060

RUN git clone https://github.com/facebookresearch/xformers/ repositories/xformers && cd repositories/xformers && git submodule update --init --recursive

RUN apt install -y g++
RUN cd repositories/xformers && \
    export FORCE_CUDA="1" && \
    export TORCH_CUDA_ARCH_LIST=8.6 && \
    CUDA_VISIBLE_DEVICES=0 pip install --verbose --no-deps -e .

i have 3090ti, 20.04.1-Ubuntu, and run: export FORCE_CUDA="1" && export TORCH_CUDA_ARCH_LIST=11.6 && CUDA_VISIBLE_DEVICES=0 pip install --verbose --no-deps -e . but i got an error:

Looking in indexes: https://pypi.tuna.tsinghua.edu.cn/simple
Obtaining file:///home/kai/my_download/stable-diffusion-webui/repositories/xformers
  Preparing metadata (setup.py) ... error
  error: subprocess-exited-with-error
  
  × python setup.py egg_info did not run successfully.
  │ exit code: 1
  ╰─> [10 lines of output]
      Traceback (most recent call last):
        File "<string>", line 2, in <module>
        File "<pip-setuptools-caller>", line 34, in <module>
        File "/home/kai/my_download/stable-diffusion-webui/repositories/xformers/setup.py", line 304, in <module>
          ext_modules=get_extensions(),
        File "/home/kai/my_download/stable-diffusion-webui/repositories/xformers/setup.py", line 251, in get_extensions
          ext_modules += get_flash_attention_extensions(
        File "/home/kai/my_download/stable-diffusion-webui/repositories/xformers/setup.py", line 117, in get_flash_attention_extensions
          num = 10 * int(arch[0]) + int(arch[2])
      ValueError: invalid literal for int() with base 10: '.'
      [end of output]
  
  note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed

× Encountered error while generating package metadata.
╰─> See above for output.

note: This is an issue with the package mentioned above, not pip.
hint: See above for details.

Dec 13 '22 03:12 kkimmm

If anyone is having trouble with that in Docker, that helped me: (change TORCH_CUDA_ARCH_LIST to your value). 8.6 is for RTX 3060

RUN git clone https://github.com/facebookresearch/xformers/ repositories/xformers && cd repositories/xformers && git submodule update --init --recursive

RUN apt install -y g++
RUN cd repositories/xformers && \
    export FORCE_CUDA="1" && \
    export TORCH_CUDA_ARCH_LIST=8.6 && \
    CUDA_VISIBLE_DEVICES=0 pip install --verbose --no-deps -e .

If you're on a local Ubuntu or Ubuntu Desktop instance please see this issue instead first: https://github.com/AUTOMATIC1111/stable-diffusion-webui/issues/4942. I will add details there of some cleanup I had to do after attempting the fix from this PR. cc @kkimmm

Dec 21 '22 14:12 jpollard-cs

Also I'm unsure why you'd want to set CUDA_VISIBLE_DEVICES to 0 unless you don't have any NVIDIA GPUs (which you indicated you had an RTX 3060). If I understand correctly this would result in a build that does not leverage your GPU.

Dec 21 '22 14:12 jpollard-cs

Also I'm unsure why you'd want to set CUDA_VISIBLE_DEVICES to 0 unless you don't have any NVIDIA GPUs (which you indicated you had an RTX 3060). If I understand correctly this would result in a build that does not leverage your GPU.

CUDA_VISIBLE_DEVICES is a list of CUDA DEVICE ID slots,

https://developer.nvidia.com/blog/cuda-pro-tip-control-gpu-visibility-cuda_visible_devices/

they are number 0, 1, 2 etc.

Dec 21 '22 17:12 Thomas-MMJ

Also I'm unsure why you'd want to set CUDA_VISIBLE_DEVICES to 0 unless you don't have any NVIDIA GPUs (which you indicated you had an RTX 3060). If I understand correctly this would result in a build that does not leverage your GPU.

CUDA_VISIBLE_DEVICES is a list of CUDA DEVICE ID slots,

https://developer.nvidia.com/blog/cuda-pro-tip-control-gpu-visibility-cuda_visible_devices/

they are number 0, 1, 2 etc.

Ah okay got it. Looks like I read some misguided information on this. Thanks for the clarification @Thomas-MMJ

Dec 22 '22 15:12 jpollard-cs

@kkimmm

If anyone is having trouble with that in Docker, that helped me: (change TORCH_CUDA_ARCH_LIST to your value). 8.6 is for RTX 3060

RUN git clone https://github.com/facebookresearch/xformers/ repositories/xformers && cd repositories/xformers && git submodule update --init --recursive

RUN apt install -y g++
RUN cd repositories/xformers && \
    export FORCE_CUDA="1" && \
    export TORCH_CUDA_ARCH_LIST=8.6 && \
    CUDA_VISIBLE_DEVICES=0 pip install --verbose --no-deps -e .

i have 3090ti, 20.04.1-Ubuntu, and run: export FORCE_CUDA="1" && export TORCH_CUDA_ARCH_LIST=11.6 && CUDA_VISIBLE_DEVICES=0 pip install --verbose --no-deps -e . but i got an error:

Looking in indexes: https://pypi.tuna.tsinghua.edu.cn/simple
Obtaining file:///home/kai/my_download/stable-diffusion-webui/repositories/xformers
  Preparing metadata (setup.py) ... error
  error: subprocess-exited-with-error
  
  × python setup.py egg_info did not run successfully.
  │ exit code: 1
  ╰─> [10 lines of output]
      Traceback (most recent call last):
        File "<string>", line 2, in <module>
        File "<pip-setuptools-caller>", line 34, in <module>
        File "/home/kai/my_download/stable-diffusion-webui/repositories/xformers/setup.py", line 304, in <module>
          ext_modules=get_extensions(),
        File "/home/kai/my_download/stable-diffusion-webui/repositories/xformers/setup.py", line 251, in get_extensions
          ext_modules += get_flash_attention_extensions(
        File "/home/kai/my_download/stable-diffusion-webui/repositories/xformers/setup.py", line 117, in get_flash_attention_extensions
          num = 10 * int(arch[0]) + int(arch[2])
      ValueError: invalid literal for int() with base 10: '.'
      [end of output]
  
  note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed

× Encountered error while generating package metadata.
╰─> See above for output.

note: This is an issue with the package mentioned above, not pip.
hint: See above for details.

I believe arch list is not meant to be your cuda version - refer to https://docs.nvidia.com/cuda/cuda-compiler-driver-nvcc/index.html#gpu-feature-list

Jan 06 '23 14:01 chris-aeviator

So how to fix it?

Apr 01 '23 06:04 kopyl

Closing as stale.

Aug 03 '23 18:08 catboxanon

stable-diffusion-webui
stable-diffusion-webui copied to clipboard

Could not run xformers::efficient_attention_forward_cutlass

stable-diffusion-webui stable-diffusion-webui copied to clipboard

Could not run xformers::efficient_attention_forward_cutlass

stable-diffusion-webui
stable-diffusion-webui copied to clipboard