cutlass icon indicating copy to clipboard operation
cutlass copied to clipboard

Add dual-GEMM examples for SM90 (Hopper) and SM120 (Blackwell)

Open Inodayy opened this issue 3 months ago • 3 comments

Summary

Implements dual-GEMM examples for SM90 (Hopper) and SM120 (Blackwell) using CUTLASS 3.x.

The dual-GEMM operation implemented is:

  D0 = epilogue0(X @ B0, C0)
  D1 = epilogue1(X @ B1, C1)
  D2 = element_wise(D0, D1)

Implementation details

  • Based on the single-GEMM examples 48_hopper_warp_specialized_gemm.cu and 79a_blackwell_geforce_nvfp4_bf16_gemm.cu

  • B0 and B1 layouts are not decoupled, but both are passed separately to the builders for potential future flexibility. (Blackwell supports only TN layout; Hopper assumes NK layout for make_tma_copy_B_sm90 etc.)

  • D2 performs LeftSiLUAndMul similar to example 45_dual_gemm, implemented in collective/sm90_epilogue_tma_warpspecialized_dual.hpp store()

  • D0 and D1 are intermediate results only and are not stored.

  • Added template<class Op0, class Op1> in fusion/sm90_callbacks… to allow distinct operations for D0 and D1.

Performance (keeping all configurations same as single-GEMM examples)

SM90 (Hopper)

  • Problem size: 2048×2048×2048
  • Rasterization: Heuristic with max CTA swizzle 2
  • Avg runtime: 0.20429 ms
  • GFLOPS: 168,191
  • ≈5% faster than two single-GEMM baseline

SM120 (Blackwell)

  • Problem size: 2048×2048×2048
  • Avg runtime: 0.155648 ms
  • GFLOPS: 220,753
  • ≈30% slower than two single-GEMM baseline (haven’t been able to find the root cause yet)

Notes

  • I am relatively new to CUTLASS C++; this work was implemented as a learning exercise. I followed example structure similar to 63_hopper_gemm_with_weight_prefetch.
  • The SM120 example was an initial local starting point and can be removed if unnecessary

Closes #1123

Inodayy avatar Oct 13 '25 14:10 Inodayy

@hwu36 @mnicely Hi, just checking whether 3.x dual-gemm is still planned, and if there’s any chance this PR might get reviewed later if time allows? I’d appreciate any feedback on whether I’m on the right track. Thanks!

Inodayy avatar Oct 20 '25 23:10 Inodayy

@ANIKET-SHIVAM , @IonThruster @depaulmillz could you please take a look?

hwu36 avatar Oct 28 '25 02:10 hwu36

This PR has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this PR if it is no longer required. Otherwise, please respond with a comment indicating any updates. This PR will be labeled inactive-90d if there is no activity in the next 60 days.

github-actions[bot] avatar Nov 27 '25 02:11 github-actions[bot]