[BUG] Fuse Reduction on SM 90

Open rajagond opened this issue 8 months ago • 1 comments

TORCH_CHECK(
    !fuse_reduction || input_dtype == at::ScalarType::Half,
    "Fuse reduction only support float16 type on SM80 due to instruction limitation.");

It explicitly restricts fused reduction to float16, regardless of GPU architecture.

When I am using fuse_reduction = True with bfloat16 on NVIDIA H100s, it is giving me memory error. I disabled the check though.

Apr 05 '25 09:04 rajagond

@zheng-ningxin sm90 fuse_reduction broken?

Apr 07 '25 08:04 houqi