LightX2V icon indicating copy to clipboard operation
LightX2V copied to clipboard

cutlass_scaled_nvfp4_mm_sm120 RuntimeError: Error Internal

Open changdong1687 opened this issue 2 months ago • 0 comments

Hello, I build the kernels in RTX5090 with cuda-12.8, and when I call the fucntion cutlass_scaled_nvfp4_mm_sm120 with bias is not None, it gives me a RuntimeError: Error Internal

I tryed:

m = A_fp4.shape[0]
n = B_fp4.shape[0]
dtype=torch.bfloat16

bias = None
cutlass_scaled_nvfp4_mm(A_fp4, B_fp4, A_scale, B_scale, alpha, dtype, bias) # it works well

bias = torch.randn((1, n), dtype=dtype).cuda()
cutlass_scaled_nvfp4_mm(A_fp4, B_fp4, A_scale, B_scale, alpha, dtype, bias) # it errors

bias = torch.randn(n, dtype=dtype).cuda()
cutlass_scaled_nvfp4_mm(A_fp4, B_fp4, A_scale, B_scale, alpha, dtype, bias) # it also errors

So, how can I solve this problem

changdong1687 avatar Oct 25 '25 14:10 changdong1687