xformers
xformers copied to clipboard

Published 20 hours ago •

facebookresearch

Reame
Issues

Batch size >= 65536 in xformers.ops.memory_efficient_attention gives CUDA error.

Open comfyanonymous opened this issue 9 months ago • 8 comments

🐛 Bug

Xformers gives a CUDA error like this when the batch size is larger or equal to 65536.

RuntimeError: CUDA error: invalid configuration argument
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

Command

To Reproduce

Steps to reproduce the behavior:

import xformers
import xformers.ops
import torch

q = torch.zeros(([65536, 16, 80])).cuda()
k = torch.zeros(([65536, 16, 80])).cuda()
v = torch.zeros(([65536, 16, 80])).cuda()
out = xformers.ops.memory_efficient_attention(q, k, v, attn_bias=None)

Expected behavior

Raise a NotImplementedError or a ValueError if the input sizes are not supported.

Environment

I can reproduce this with the above code on my 3090 TI with xformers 0.0.21 and on the T4 GPU on free google colab with xformers-0.0.22.dev599

Sep 03 '23 06:09 comfyanonymous