[BUG] EpilogueTileAuto doesn't work when tile shape is (128, 112, 64)
Describe the bug I am using gemm configs from generator.py. op.configuration name is cutlass3x_sm90_tensorop_s64x56x16gemm_f16_f16_f32_void_f16_128x112x64_4x1x1_0_ttn_align8_warpspecialized_pingpong_epi_tma
This is the code https://gist.github.com/henrylhtsang/aa797c7d280b792eef574317111ac215
Question:
- Should I consider not use EpilogueTileAuto? Or is this fixable from this logic? https://github.com/NVIDIA/cutlass/blob/eefa171318b79cbe2e78514d4cce5cd0fe919d0c/include/cutlass/epilogue/collective/builders/sm90_builder.inl#L115-L121
- What epilogue tile shape should I use?
Steps/Code to reproduce bug compile https://gist.github.com/henrylhtsang/aa797c7d280b792eef574317111ac215
static_assert(size<1>(CtaTileMNK{}) % size<1>(shape(EpilogueTile{})) == 0, "EPI_TILE_N must divide CTA_N");
Expected behavior compilation should be successful
Environment details (please complete the following information):
- Environment location: [Bare-metal, Docker, Cloud(specify cloud provider)]
Additional context Add any other context about the problem here.
You'd want
auto tile_n = cute::gcd(cute::min(_32{}, size<1>(TileShape_MNK{})), size<1>(TileShape_MNK{}));
This issue has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d if there is no activity in the next 60 days.
Keeping live.
closed with https://github.com/NVIDIA/cutlass/pull/2220