xformers
xformers copied to clipboard
Error during loss backward
🐛 Bug
Command
I encounter the following error during training from time to time:
File "/home/qiyan/anaconda3/envs/mtr/lib/python3.8/site-packages/torch/_tensor.py", line 522, in backward
torch.autograd.backward(
File "/home/qiyan/anaconda3/envs/mtr/lib/python3.8/site-packages/torch/autograd/__init__.py", line 266, in backward
Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
File "/home/qiyan/anaconda3/envs/mtr/lib/python3.8/site-packages/torch/autograd/function.py", line 289, in apply
return user_fn(self, *args)
File "/home/qiyan/anaconda3/envs/mtr/lib/python3.8/site-packages/torch/autograd/function.py", line 570, in wrapper
outputs = fn(ctx, *args)
File "/home/qiyan/anaconda3/envs/mtr/lib/python3.8/site-packages/xformers/ops/fmha/__init__.py", line 108, in backward
grads = _memory_efficient_attention_backward(
File "/home/qiyan/anaconda3/envs/mtr/lib/python3.8/site-packages/xformers/ops/fmha/__init__.py", line 410, in _memory_efficient_attention_backward
grads = op.apply(ctx, inp, grad)
File "/home/qiyan/anaconda3/envs/mtr/lib/python3.8/site-packages/xformers/ops/fmha/cutlass.py", line 429, in apply
(grad_q, grad_k, grad_v, grad_bias) = cls.OPERATOR(
File "/home/qiyan/anaconda3/envs/mtr/lib/python3.8/site-packages/torch/_ops.py", line 755, in __call__
return self._op(*args, **(kwargs or {}))
RuntimeError: p.gQ_strideM() == grad_q.stride(1) INTERNAL ASSERT FAILED at "/__w/xformers/xformers/xformers/csrc/attention/cuda/fmha/attention_backward_generic.cu":260, please report a bug to PyTorch.
To Reproduce
Unfortunately, I don't yet have a concise snippet for reproduction. But when I switch to pytorch's torch.nn.functional.scaled_dot_product_attention
in my attention layers, the problem is gone without changing any other thing. I can reproduce this error on both RTX 3080 Ti and RTX 2080 Ti.
I notice there was a previous issue #628 and it seems the problem has been resolved. But I am not sure why it still appears in my code. Any help is appreciated.
Environment
> python -m xformers.info
Unable to find python bindings at /usr/local/dcgm/bindings/python3. No data will be captured.
xFormers 0.0.24+cu118
memory_efficient_attention.cutlassF: available
memory_efficient_attention.cutlassB: available
memory_efficient_attention.decoderF: available
[email protected]: available
[email protected]: available
memory_efficient_attention.smallkF: available
memory_efficient_attention.smallkB: available
memory_efficient_attention.tritonflashattF: unavailable
memory_efficient_attention.tritonflashattB: unavailable
memory_efficient_attention.triton_splitKF: unavailable
indexing.scaled_index_addF: unavailable
indexing.scaled_index_addB: unavailable
indexing.index_select: unavailable
sequence_parallel_fused.write_values: unavailable
sequence_parallel_fused.wait_values: unavailable
sequence_parallel_fused.cuda_memset_32b_async: unavailable
sp24.sparse24_sparsify_both_ways: available
sp24.sparse24_apply: available
sp24.sparse24_apply_dense_output: available
sp24._sparse24_gemm: available
[email protected]: available
swiglu.dual_gemm_silu: available
swiglu.gemm_fused_operand_sum: available
swiglu.fused.p.cpp: available
is_triton_available: False
pytorch.version: 2.2.0
pytorch.cuda: available
gpu.compute_capability: 7.5
gpu.name: NVIDIA GeForce RTX 2080 Ti
dcgm_profiler: unavailable
build.info: available
build.cuda_version: 1108
build.python_version: 3.8.18
build.torch_version: 2.2.0+cu118
build.env.TORCH_CUDA_ARCH_LIST: 5.0+PTX 6.0 6.1 7.0 7.5 8.0+PTX 9.0
build.env.XFORMERS_BUILD_TYPE: Release
build.env.XFORMERS_ENABLE_DEBUG_ASSERTIONS: None
build.env.NVCC_FLAGS: None
build.env.XFORMERS_PACKAGE_FROM: wheel-v0.0.24
build.nvcc_version: 11.8.89
source.privacy: open source