stable-diffusion-webui
stable-diffusion-webui copied to clipboard
Could not run xformers::efficient_attention_forward_cutlass
venv "C:\Users\GEN32UC\stable-diffusion-webui\venv\Scripts\Python.exe"
Python 3.8.10 (tags/v3.8.10:3d8993a, May 3 2021, 11:48:03) [MSC v.1928 64 bit (AMD64)]
Commit hash: cbf6dad02d04d98e5a2d5e870777ab99b5796b2d
Installing requirements for Web UI
Launching Web UI with arguments: --listen --always-batch-cond-uncond --precision full --no-half --opt-split-attention --force-enable-xformers
LatentDiffusion: Running in eps-prediction mode
DiffusionWrapper has 859.52 M params.
making attention of type 'vanilla' with 512 in_channels
Working with z of shape (1, 4, 32, 32) = 4096 dimensions.
making attention of type 'vanilla' with 512 in_channels
Loading weights [7460a6fa] from C:\Users\GEN32UC\stable-diffusion-webui\models\Stable-diffusion\model.ckpt
Global Step: 470000
Applying xformers cross attention optimization.
Model loaded.
Loading hypernetwork None
Loaded a total of 6 textual inversion embeddings.
Running on local URL: http://0.0.0.0:7860
To create a public link, set `share=True` in `launch()`.
0%| | 0/20 [00:01<?, ?it/s]
Error completing request
Arguments: ('cat', '', 'None', 'None', 20, 0, False, False, 1, 1, 7, -1.0, -1.0, 0, 0, 0, False, 512, 512, False, False, 0.7, 0, False, False, None, '', 1, '', 4, '', True, False) {}
Traceback (most recent call last):
File "C:\Users\GEN32UC\stable-diffusion-webui\modules\ui.py", line 176, in f
res = list(func(*args, **kwargs))
File "C:\Users\GEN32UC\stable-diffusion-webui\webui.py", line 68, in f
res = func(*args, **kwargs)
File "C:\Users\GEN32UC\stable-diffusion-webui\modules\txt2img.py", line 43, in txt2img
processed = process_images(p)
File "C:\Users\GEN32UC\stable-diffusion-webui\modules\processing.py", line 391, in process_images
samples_ddim = p.sample(conditioning=c, unconditional_conditioning=uc, seeds=seeds, subseeds=subseeds, subseed_strength=p.subseed_strength)
File "C:\Users\GEN32UC\stable-diffusion-webui\modules\processing.py", line 518, in sample
samples = self.sampler.sample(self, x, conditioning, unconditional_conditioning)
File "C:\Users\GEN32UC\stable-diffusion-webui\modules\sd_samplers.py", line 399, in sample
samples = self.func(self.model_wrap_cfg, x, extra_args={'cond': conditioning, 'uncond': unconditional_conditioning, 'cond_scale': p.cfg_scale}, disable=False, callback=self.callback_state, **extra_params_kwargs)
File "C:\Users\GEN32UC\stable-diffusion-webui\venv\lib\site-packages\torch\autograd\grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "C:\Users\GEN32UC\stable-diffusion-webui\repositories\k-diffusion\k_diffusion\sampling.py", line 80, in sample_euler_ancestral
denoised = model(x, sigmas[i] * s_in, **extra_args)
File "C:\Users\GEN32UC\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "C:\Users\GEN32UC\stable-diffusion-webui\modules\sd_samplers.py", line 239, in forward
x_out = self.inner_model(x_in, sigma_in, cond=cond_in)
File "C:\Users\GEN32UC\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "C:\Users\GEN32UC\stable-diffusion-webui\repositories\k-diffusion\k_diffusion\external.py", line 112, in forward
eps = self.get_eps(input * c_in, self.sigma_to_t(sigma), **kwargs)
File "C:\Users\GEN32UC\stable-diffusion-webui\repositories\k-diffusion\k_diffusion\external.py", line 138, in get_eps
return self.inner_model.apply_model(*args, **kwargs)
File "C:\Users\GEN32UC\stable-diffusion-webui\repositories\stable-diffusion\ldm\models\diffusion\ddpm.py", line 987, in apply_model
x_recon = self.model(x_noisy, t, **cond)
File "C:\Users\GEN32UC\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "C:\Users\GEN32UC\stable-diffusion-webui\repositories\stable-diffusion\ldm\models\diffusion\ddpm.py", line 1410, in forward
out = self.diffusion_model(x, t, context=cc)
File "C:\Users\GEN32UC\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "C:\Users\GEN32UC\stable-diffusion-webui\repositories\stable-diffusion\ldm\modules\diffusionmodules\openaimodel.py", line 732, in forward
h = module(h, emb, context)
File "C:\Users\GEN32UC\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "C:\Users\GEN32UC\stable-diffusion-webui\repositories\stable-diffusion\ldm\modules\diffusionmodules\openaimodel.py", line 85, in forward
x = layer(x, context)
File "C:\Users\GEN32UC\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "C:\Users\GEN32UC\stable-diffusion-webui\repositories\stable-diffusion\ldm\modules\attention.py", line 258, in forward
x = block(x, context=context)
File "C:\Users\GEN32UC\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "C:\Users\GEN32UC\stable-diffusion-webui\repositories\stable-diffusion\ldm\modules\attention.py", line 209, in forward
return checkpoint(self._forward, (x, context), self.parameters(), self.checkpoint)
File "C:\Users\GEN32UC\stable-diffusion-webui\repositories\stable-diffusion\ldm\modules\diffusionmodules\util.py", line 114, in checkpoint
return CheckpointFunction.apply(func, len(inputs), *args)
File "C:\Users\GEN32UC\stable-diffusion-webui\repositories\stable-diffusion\ldm\modules\diffusionmodules\util.py", line 127, in forward
output_tensors = ctx.run_function(*ctx.input_tensors)
File "C:\Users\GEN32UC\stable-diffusion-webui\repositories\stable-diffusion\ldm\modules\attention.py", line 212, in _forward
x = self.attn1(self.norm1(x)) + x
File "C:\Users\GEN32UC\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "C:\Users\GEN32UC\stable-diffusion-webui\modules\sd_hijack_optimizations.py", line 145, in xformers_attention_forward
out = xformers.ops.memory_efficient_attention(q, k, v, attn_bias=None)
File "c:\users\gen32uc\stable-diffusion-webui\repositories\xformers\xformers\ops.py", line 862, in memory_efficient_attention
return op.forward_no_grad(
File "c:\users\gen32uc\stable-diffusion-webui\repositories\xformers\xformers\ops.py", line 305, in forward_no_grad
return cls.FORWARD_OPERATOR(
File "C:\Users\GEN32UC\stable-diffusion-webui\venv\lib\site-packages\torch\_ops.py", line 143, in __call__
return self._op(*args, **kwargs or {})
NotImplementedError: Could not run 'xformers::efficient_attention_forward_cutlass' with arguments from the 'CUDA' backend. This could be because the operator doesn't exist for this backend, or was omitted during the selective/custom build process (if using custom build). If you are a Facebook employee using PyTorch on mobile, please visit https://fburl.com/ptmfixes for possible resolutions. 'xformers::efficient_attention_forward_cutlass' is only available for these backends: [UNKNOWN_TENSOR_TYPE_ID, QuantizedXPU, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, SparseCPU, SparseCUDA, SparseHIP, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, SparseVE, UNKNOWN_TENSOR_TYPE_ID, NestedTensorCUDA, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID, UNKNOWN_TENSOR_TYPE_ID].
BackendSelect: fallthrough registered at C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\core\BackendSelectFallbackKernel.cpp:3 [backend fallback]
Python: registered at C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\core\PythonFallbackKernel.cpp:133 [backend fallback]
Named: registered at C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\core\NamedRegistrations.cpp:7 [backend fallback]
Conjugate: registered at C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\ConjugateFallback.cpp:18 [backend fallback]
Negative: registered at C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\NegateFallback.cpp:18 [backend fallback]
ZeroTensor: registered at C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\ZeroTensorFallback.cpp:86 [backend fallback]
FuncTorchDynamicLayerBackMode: registered at C:\Users\circleci\project\functorch\csrc\DynamicLayer.cpp:487 [backend fallback]
ADInplaceOrView: fallthrough registered at C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\core\VariableFallbackKernel.cpp:64 [backend fallback]
AutogradOther: fallthrough registered at C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\core\VariableFallbackKernel.cpp:35 [backend fallback]
AutogradCPU: fallthrough registered at C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\core\VariableFallbackKernel.cpp:39 [backend fallback]
AutogradCUDA: fallthrough registered at C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\core\VariableFallbackKernel.cpp:47 [backend fallback]
AutogradXLA: fallthrough registered at C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\core\VariableFallbackKernel.cpp:51 [backend fallback]
AutogradMPS: fallthrough registered at C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\core\VariableFallbackKernel.cpp:59 [backend fallback]
AutogradXPU: fallthrough registered at C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\core\VariableFallbackKernel.cpp:43 [backend fallback]
AutogradHPU: fallthrough registered at C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\core\VariableFallbackKernel.cpp:68 [backend fallback]
AutogradLazy: fallthrough registered at C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\core\VariableFallbackKernel.cpp:55 [backend fallback]
Tracer: registered at C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\torch\csrc\autograd\TraceTypeManual.cpp:295 [backend fallback]
AutocastCPU: fallthrough registered at C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\autocast_mode.cpp:481 [backend fallback]
Autocast: fallthrough registered at C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\autocast_mode.cpp:324 [backend fallback]
FuncTorchBatched: registered at C:\Users\circleci\project\functorch\csrc\LegacyBatchingRegistrations.cpp:661 [backend fallback]
FuncTorchVmapMode: fallthrough registered at C:\Users\circleci\project\functorch\csrc\VmapModeRegistrations.cpp:24 [backend fallback]
Batched: registered at C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\BatchingRegistrations.cpp:1064 [backend fallback]
VmapMode: fallthrough registered at C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\VmapModeRegistrations.cpp:33 [backend fallback]
FuncTorchGradWrapper: registered at C:\Users\circleci\project\functorch\csrc\TensorWrapper.cpp:187 [backend fallback]
Functionalize: registered at C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\FunctionalizeFallbackKernel.cpp:89 [backend fallback]
PythonTLSSnapshot: registered at C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\core\PythonFallbackKernel.cpp:137 [backend fallback]
FuncTorchDynamicLayerFrontMode: registered at C:\Users\circleci\project\functorch\csrc\DynamicLayer.cpp:483 [backend fallback]
today i decided to try xformer, after many failed install, after all, it successful installed. When i press generate, it just have above error CUDA lastest, before xformers installed and run with command, everything just work normal.
do you have Cutlass installed?
conda install cutlass
or
pip install cutlass
either you can try and install cutlass,
or you can uninstall xformers
pip uninstall xformers
well i just delete xformers folder and recomplie, with torch_cuda_arch_list, it worked now i think keep install it on exist folder(even it dont have anything inside caused the problem)
If anyone is having trouble with that in Docker, that helped me: (change TORCH_CUDA_ARCH_LIST to your value). 8.6 is for RTX 3060
RUN git clone https://github.com/facebookresearch/xformers/ repositories/xformers && cd repositories/xformers && git submodule update --init --recursive
RUN apt install -y g++
RUN cd repositories/xformers && \
export FORCE_CUDA="1" && \
export TORCH_CUDA_ARCH_LIST=8.6 && \
CUDA_VISIBLE_DEVICES=0 pip install --verbose --no-deps -e .
If anyone is having trouble with that in Docker, that helped me: (change TORCH_CUDA_ARCH_LIST to your value). 8.6 is for RTX 3060
RUN git clone https://github.com/facebookresearch/xformers/ repositories/xformers && cd repositories/xformers && git submodule update --init --recursive RUN apt install -y g++ RUN cd repositories/xformers && \ export FORCE_CUDA="1" && \ export TORCH_CUDA_ARCH_LIST=8.6 && \ CUDA_VISIBLE_DEVICES=0 pip install --verbose --no-deps -e .
i have 3090ti, 20.04.1-Ubuntu, and run:
export FORCE_CUDA="1" && export TORCH_CUDA_ARCH_LIST=11.6 && CUDA_VISIBLE_DEVICES=0 pip install --verbose --no-deps -e .
but i got an error:
Looking in indexes: https://pypi.tuna.tsinghua.edu.cn/simple
Obtaining file:///home/kai/my_download/stable-diffusion-webui/repositories/xformers
Preparing metadata (setup.py) ... error
error: subprocess-exited-with-error
× python setup.py egg_info did not run successfully.
│ exit code: 1
╰─> [10 lines of output]
Traceback (most recent call last):
File "<string>", line 2, in <module>
File "<pip-setuptools-caller>", line 34, in <module>
File "/home/kai/my_download/stable-diffusion-webui/repositories/xformers/setup.py", line 304, in <module>
ext_modules=get_extensions(),
File "/home/kai/my_download/stable-diffusion-webui/repositories/xformers/setup.py", line 251, in get_extensions
ext_modules += get_flash_attention_extensions(
File "/home/kai/my_download/stable-diffusion-webui/repositories/xformers/setup.py", line 117, in get_flash_attention_extensions
num = 10 * int(arch[0]) + int(arch[2])
ValueError: invalid literal for int() with base 10: '.'
[end of output]
note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed
× Encountered error while generating package metadata.
╰─> See above for output.
note: This is an issue with the package mentioned above, not pip.
hint: See above for details.
If anyone is having trouble with that in Docker, that helped me: (change TORCH_CUDA_ARCH_LIST to your value). 8.6 is for RTX 3060
RUN git clone https://github.com/facebookresearch/xformers/ repositories/xformers && cd repositories/xformers && git submodule update --init --recursive RUN apt install -y g++ RUN cd repositories/xformers && \ export FORCE_CUDA="1" && \ export TORCH_CUDA_ARCH_LIST=8.6 && \ CUDA_VISIBLE_DEVICES=0 pip install --verbose --no-deps -e .
If you're on a local Ubuntu or Ubuntu Desktop instance please see this issue instead first: https://github.com/AUTOMATIC1111/stable-diffusion-webui/issues/4942. I will add details there of some cleanup I had to do after attempting the fix from this PR. cc @kkimmm
Also I'm unsure why you'd want to set CUDA_VISIBLE_DEVICES
to 0
unless you don't have any NVIDIA GPUs (which you indicated you had an RTX 3060). If I understand correctly this would result in a build that does not leverage your GPU.
Also I'm unsure why you'd want to set
CUDA_VISIBLE_DEVICES
to0
unless you don't have any NVIDIA GPUs (which you indicated you had an RTX 3060). If I understand correctly this would result in a build that does not leverage your GPU.
CUDA_VISIBLE_DEVICES is a list of CUDA DEVICE ID slots,
https://developer.nvidia.com/blog/cuda-pro-tip-control-gpu-visibility-cuda_visible_devices/
they are number 0, 1, 2 etc.
Also I'm unsure why you'd want to set
CUDA_VISIBLE_DEVICES
to0
unless you don't have any NVIDIA GPUs (which you indicated you had an RTX 3060). If I understand correctly this would result in a build that does not leverage your GPU.CUDA_VISIBLE_DEVICES is a list of CUDA DEVICE ID slots,
https://developer.nvidia.com/blog/cuda-pro-tip-control-gpu-visibility-cuda_visible_devices/
they are number 0, 1, 2 etc.
Ah okay got it. Looks like I read some misguided information on this. Thanks for the clarification @Thomas-MMJ
@kkimmm
If anyone is having trouble with that in Docker, that helped me: (change TORCH_CUDA_ARCH_LIST to your value). 8.6 is for RTX 3060
RUN git clone https://github.com/facebookresearch/xformers/ repositories/xformers && cd repositories/xformers && git submodule update --init --recursive RUN apt install -y g++ RUN cd repositories/xformers && \ export FORCE_CUDA="1" && \ export TORCH_CUDA_ARCH_LIST=8.6 && \ CUDA_VISIBLE_DEVICES=0 pip install --verbose --no-deps -e .
i have 3090ti, 20.04.1-Ubuntu, and run:
export FORCE_CUDA="1" && export TORCH_CUDA_ARCH_LIST=11.6 && CUDA_VISIBLE_DEVICES=0 pip install --verbose --no-deps -e .
but i got an error:Looking in indexes: https://pypi.tuna.tsinghua.edu.cn/simple Obtaining file:///home/kai/my_download/stable-diffusion-webui/repositories/xformers Preparing metadata (setup.py) ... error error: subprocess-exited-with-error × python setup.py egg_info did not run successfully. │ exit code: 1 ╰─> [10 lines of output] Traceback (most recent call last): File "<string>", line 2, in <module> File "<pip-setuptools-caller>", line 34, in <module> File "/home/kai/my_download/stable-diffusion-webui/repositories/xformers/setup.py", line 304, in <module> ext_modules=get_extensions(), File "/home/kai/my_download/stable-diffusion-webui/repositories/xformers/setup.py", line 251, in get_extensions ext_modules += get_flash_attention_extensions( File "/home/kai/my_download/stable-diffusion-webui/repositories/xformers/setup.py", line 117, in get_flash_attention_extensions num = 10 * int(arch[0]) + int(arch[2]) ValueError: invalid literal for int() with base 10: '.' [end of output] note: This error originates from a subprocess, and is likely not a problem with pip. error: metadata-generation-failed × Encountered error while generating package metadata. ╰─> See above for output. note: This is an issue with the package mentioned above, not pip. hint: See above for details.
I believe arch list is not meant to be your cuda version - refer to https://docs.nvidia.com/cuda/cuda-compiler-driver-nvcc/index.html#gpu-feature-list
So how to fix it?
Closing as stale.