xformers icon indicating copy to clipboard operation
xformers copied to clipboard

Error during loss backward

Open qiyan98 opened this issue 4 months ago • 0 comments

🐛 Bug

Command

I encounter the following error during training from time to time:

  File "/home/qiyan/anaconda3/envs/mtr/lib/python3.8/site-packages/torch/_tensor.py", line 522, in backward
    torch.autograd.backward(
  File "/home/qiyan/anaconda3/envs/mtr/lib/python3.8/site-packages/torch/autograd/__init__.py", line 266, in backward
    Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
  File "/home/qiyan/anaconda3/envs/mtr/lib/python3.8/site-packages/torch/autograd/function.py", line 289, in apply
    return user_fn(self, *args)
  File "/home/qiyan/anaconda3/envs/mtr/lib/python3.8/site-packages/torch/autograd/function.py", line 570, in wrapper
    outputs = fn(ctx, *args)
  File "/home/qiyan/anaconda3/envs/mtr/lib/python3.8/site-packages/xformers/ops/fmha/__init__.py", line 108, in backward
    grads = _memory_efficient_attention_backward(
  File "/home/qiyan/anaconda3/envs/mtr/lib/python3.8/site-packages/xformers/ops/fmha/__init__.py", line 410, in _memory_efficient_attention_backward
    grads = op.apply(ctx, inp, grad)
  File "/home/qiyan/anaconda3/envs/mtr/lib/python3.8/site-packages/xformers/ops/fmha/cutlass.py", line 429, in apply
    (grad_q, grad_k, grad_v, grad_bias) = cls.OPERATOR(
  File "/home/qiyan/anaconda3/envs/mtr/lib/python3.8/site-packages/torch/_ops.py", line 755, in __call__
    return self._op(*args, **(kwargs or {}))
RuntimeError: p.gQ_strideM() == grad_q.stride(1) INTERNAL ASSERT FAILED at "/__w/xformers/xformers/xformers/csrc/attention/cuda/fmha/attention_backward_generic.cu":260, please report a bug to PyTorch.

To Reproduce

Unfortunately, I don't yet have a concise snippet for reproduction. But when I switch to pytorch's torch.nn.functional.scaled_dot_product_attention in my attention layers, the problem is gone without changing any other thing. I can reproduce this error on both RTX 3080 Ti and RTX 2080 Ti.

I notice there was a previous issue #628 and it seems the problem has been resolved. But I am not sure why it still appears in my code. Any help is appreciated.

Environment

> python -m xformers.info
Unable to find python bindings at /usr/local/dcgm/bindings/python3. No data will be captured.
xFormers 0.0.24+cu118
memory_efficient_attention.cutlassF:               available
memory_efficient_attention.cutlassB:               available
memory_efficient_attention.decoderF:               available
[email protected]:        available
[email protected]:        available
memory_efficient_attention.smallkF:                available
memory_efficient_attention.smallkB:                available
memory_efficient_attention.tritonflashattF:        unavailable
memory_efficient_attention.tritonflashattB:        unavailable
memory_efficient_attention.triton_splitKF:         unavailable
indexing.scaled_index_addF:                        unavailable
indexing.scaled_index_addB:                        unavailable
indexing.index_select:                             unavailable
sequence_parallel_fused.write_values:              unavailable
sequence_parallel_fused.wait_values:               unavailable
sequence_parallel_fused.cuda_memset_32b_async:     unavailable
sp24.sparse24_sparsify_both_ways:                  available
sp24.sparse24_apply:                               available
sp24.sparse24_apply_dense_output:                  available
sp24._sparse24_gemm:                               available
[email protected]:                        available
swiglu.dual_gemm_silu:                             available
swiglu.gemm_fused_operand_sum:                     available
swiglu.fused.p.cpp:                                available
is_triton_available:                               False
pytorch.version:                                   2.2.0
pytorch.cuda:                                      available
gpu.compute_capability:                            7.5
gpu.name:                                          NVIDIA GeForce RTX 2080 Ti
dcgm_profiler:                                     unavailable
build.info:                                        available
build.cuda_version:                                1108
build.python_version:                              3.8.18
build.torch_version:                               2.2.0+cu118
build.env.TORCH_CUDA_ARCH_LIST:                    5.0+PTX 6.0 6.1 7.0 7.5 8.0+PTX 9.0
build.env.XFORMERS_BUILD_TYPE:                     Release
build.env.XFORMERS_ENABLE_DEBUG_ASSERTIONS:        None
build.env.NVCC_FLAGS:                              None
build.env.XFORMERS_PACKAGE_FROM:                   wheel-v0.0.24
build.nvcc_version:                                11.8.89
source.privacy:                                    open source

qiyan98 avatar Mar 05 '24 09:03 qiyan98