ColossalAI
ColossalAI copied to clipboard
[BUG]: FusedScaleMaskSoftmax last dimension does not sum to 1
🐛 Describe the bug
I use the following code to test the softmax, but the result does not sum to one
from colossalai import kernel
import math
import torch
attention_head_size = 32
softmax = kernel.FusedScaleMaskSoftmax(input_in_fp16=True,
input_in_bf16=False,
attn_mask_type=None,
scaled_masked_softmax_fusion=True,
mask_func=lambda x, mask:x.masked_fill(mask, -50000),
softmax_in_fp32=True,
scale=1/math.sqrt(attention_head_size))
length = 200
b = 1
h = 4
hidden_states = torch.randn(b, h, length, length).half()
mask = torch.rand(1, 1, length, length)>0.5
print(softmax.is_kernel_available(mask, b, h, length, length))
output = softmax(hidden_states, mask)
print(output[0, 0, 0].sum()) # the result will be something like tensor(1.1623e-05, dtype=torch.float16), not equal 1
However, if i purposely change the head to make it not using fusion kernel, the result does sum to one.
Environment
Colossal-AI version: 0.1.13
PyTorch Version: 1.12.0 PyTorch Version required by Colossal-AI: 1.12 PyTorch version match: ✓
System CUDA Version: 11.2 CUDA Version required by PyTorch: 11.3 CUDA Version required by Colossal-AI: 11.3 CUDA Version Match: x
CUDA Extension: ✓
Thanks for your issue. We'd get back to you ASAP. :)
Finally, we found out that this code should be running on gpu.