LightX2V
LightX2V copied to clipboard
cutlass_scaled_nvfp4_mm_sm120 RuntimeError: Error Internal
Hello, I build the kernels in RTX5090 with cuda-12.8, and when I call the fucntion cutlass_scaled_nvfp4_mm_sm120 with bias is not None, it gives me a RuntimeError: Error Internal
I tryed:
m = A_fp4.shape[0]
n = B_fp4.shape[0]
dtype=torch.bfloat16
bias = None
cutlass_scaled_nvfp4_mm(A_fp4, B_fp4, A_scale, B_scale, alpha, dtype, bias) # it works well
bias = torch.randn((1, n), dtype=dtype).cuda()
cutlass_scaled_nvfp4_mm(A_fp4, B_fp4, A_scale, B_scale, alpha, dtype, bias) # it errors
bias = torch.randn(n, dtype=dtype).cuda()
cutlass_scaled_nvfp4_mm(A_fp4, B_fp4, A_scale, B_scale, alpha, dtype, bias) # it also errors
So, how can I solve this problem