ColossalAI icon indicating copy to clipboard operation
ColossalAI copied to clipboard

[BUG]: FusedScaleMaskSoftmax last dimension does not sum to 1

Open yhcc opened this issue 2 years ago • 1 comments

🐛 Describe the bug

I use the following code to test the softmax, but the result does not sum to one

from colossalai import kernel
import math
import torch

attention_head_size = 32

softmax = kernel.FusedScaleMaskSoftmax(input_in_fp16=True,
                                                    input_in_bf16=False,
                                                    attn_mask_type=None,
                                                    scaled_masked_softmax_fusion=True,
                                                    mask_func=lambda x, mask:x.masked_fill(mask, -50000),  
                                                    softmax_in_fp32=True,
                                                    scale=1/math.sqrt(attention_head_size))

length = 200
b = 1
h = 4
hidden_states = torch.randn(b, h, length, length).half()
mask = torch.rand(1, 1, length, length)>0.5

print(softmax.is_kernel_available(mask, b, h, length, length))
output = softmax(hidden_states, mask) 
print(output[0, 0, 0].sum())  # the result will be something like tensor(1.1623e-05, dtype=torch.float16), not equal 1

However, if i purposely change the head to make it not using fusion kernel, the result does sum to one.

Environment

Colossal-AI version: 0.1.13

PyTorch Version: 1.12.0 PyTorch Version required by Colossal-AI: 1.12 PyTorch version match: ✓

System CUDA Version: 11.2 CUDA Version required by PyTorch: 11.3 CUDA Version required by Colossal-AI: 11.3 CUDA Version Match: x

CUDA Extension: ✓

yhcc avatar Jan 03 '23 09:01 yhcc

Thanks for your issue. We'd get back to you ASAP. :)

Sze-qq avatar Jan 05 '23 06:01 Sze-qq

Finally, we found out that this code should be running on gpu.

yhcc avatar Jan 30 '23 13:01 yhcc