xformers
xformers copied to clipboard
Batch size >= 65536 in xformers.ops.memory_efficient_attention gives CUDA error.
🐛 Bug
Xformers gives a CUDA error like this when the batch size is larger or equal to 65536.
RuntimeError: CUDA error: invalid configuration argument
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
Command
To Reproduce
Steps to reproduce the behavior:
import xformers
import xformers.ops
import torch
q = torch.zeros(([65536, 16, 80])).cuda()
k = torch.zeros(([65536, 16, 80])).cuda()
v = torch.zeros(([65536, 16, 80])).cuda()
out = xformers.ops.memory_efficient_attention(q, k, v, attn_bias=None)
Expected behavior
Raise a NotImplementedError or a ValueError if the input sizes are not supported.
Environment
I can reproduce this with the above code on my 3090 TI with xformers 0.0.21 and on the T4 GPU on free google colab with xformers-0.0.22.dev599