[BUG] ElementC=void kernel reads non-void in `GemmDescription`
I am observing gemm_desc.C.element = bf16, when I set it as void.
Please use the following debug_branch
Please check if the below print is expected. description_.C.element bf16 // Is this expected?
We are choosing incorrect kernel because of this, is this a bug or I am messing something up?
./tools/profiler/cutlass_profiler --kernels=cutlass3x_sm90_tensorop_s64x128x32gemm_e4m3_e4m3_f32_void_bf16_128x128x128_1x2x1_0_tnn_align16_warpspecialized_epi_nosmem
description_.C.element bf16 // Is this expected?
description_.C.element bf16
description_.C.element bf16
description_.C.element bf16
description_.C.element bf16
description_.C.element bf16
description_.C.element bf16
description_.C.element bf16
=============================
Problem ID: 1
Provider: CUTLASS
OperationKind: gemm
Operation: cutlass3x_sm90_tensorop_s64x128x32gemm_e4m3_e4m3_f32_void_bf16_128x128x128_1x2x1_0_tnn_align16_warpspecialized_epi_nosmem
Status: Success
Verification: ON
Disposition: Passed
reference_device: Passed
cuBLAS: Not run
cuDNN: Not run
Arguments: --gemm_kind=universal --m=1024 --n=1024 --k=1024 --A=fe4m3:row --B=fe4m3:column --C=bf16:column --D=bf16:column \
--alpha=1 --beta=0 --split_k_mode=serial --split_k_slices=1 --batch_count=1 --raster_order=heuristic \
--op_class=tensorop --accum=f32 --cta_m=128 --cta_n=128 --cta_k=128 --cluster_m=1 --cluster_n=2 --cluster_k=1 \
--stages=7 --warps_m=4 --warps_n=1 --warps_k=1 --inst_m=64 --inst_n=128 --inst_k=32 --min_cc=90 --max_cc=90 \
Bytes: 4194304 bytes
FLOPs: 2149580800 flops
FLOPs/Byte: 512
Runtime: 0.0113731 ms
Memory: 343.463 GiB/s
Math: 189005 GFLOP/s
=============================
CSV Results:
@thakkarV , @IonThruster , @hwu36 , @mnicely
@mnicely can we commit this for 3.6 please?
Yes.
This issue has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d if there is no activity in the next 60 days.
This issue has been labeled inactive-90d due to no recent activity in the past 90 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed.
@manishucsd is this still an issue? TOT currently doesn't emit voidC with non-TMA epilogues since the primary purpose of voidC is to conserve shared memory for the mainloop
Only partially fixed. I think. Some of the ElementC=void templates still shows bf16. It is not just about the prints, we build library out of cutlass kernels and select a kernel at runtime based on these enums. So it is important for us that GemmDescription object correctly reflect CUTLASS GEMM kernel compile-time arguments at the runtime. I have workaround for now, but request if we can fix it? More broadly have some tests to verify that GemmDescription is correctly reflecting kernel's compile-time arguments.
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_128x256x64_1x2x1_0_tnn_align8_warpspecialized_pingpong_epi_tma
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_128x256x64_1x2x1_0_tnn_align8_warpspecialized_pingpong_epi_nosmem
description_.C.element bf16
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_128x256x64_1x2x1_0_tnn_align8_warpspecialized_cooperative_epi_nosmem
description_.C.element bf16
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_128x256x64_1x2x1_0_tnn_align8_warpspecialized_cooperative_epi_tma
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_128x256x64_1x2x1_0_tnn_align8_stream_k_warpspecialized_cooperative_epi_nosmem
description_.C.element bf16
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_128x256x64_1x2x1_0_tnn_align8_stream_k_warpspecialized_cooperative_epi_tma
description_.C.element void
@manishucsd I ran the latest cutlass code and the description_.C.element of above kernels are all void. Could you verify again?
Thanks @Junkai-Wu . I am unable to compile CUTLASS 3.8 at the moment https://github.com/NVIDIA/cutlass/issues/2064.
I could compile and verify with CUDA 12.8. I see compile-time ElementC=void now correctly reflected at runtime in the GemmDescription object. We can close this bug. I have more requests on GemmDescription object, which I will describe in a different issue. We can close this one. Thank you for fixing it.
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_128x256x64_1x2x1_0_nnn_align8_warpspecialized_pingpong_epi_tma
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_128x256x64_1x2x1_0_nnn_align8_warpspecialized_pingpong_epi_nosmem
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_128x256x64_1x2x1_0_nnn_align8_warpspecialized_cooperative_epi_nosmem
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_128x256x64_1x2x1_0_nnn_align8_warpspecialized_cooperative_epi_tma
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_128x256x64_1x2x1_0_nnn_align8_stream_k_warpspecialized_cooperative_epi_nosmem
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_128x256x64_1x2x1_0_nnn_align8_stream_k_warpspecialized_cooperative_epi_tma
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_128x256x64_1x2x1_0_ntn_align8_warpspecialized_pingpong_epi_tma
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_128x256x64_1x2x1_0_ntn_align8_warpspecialized_pingpong_epi_nosmem
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_128x256x64_1x2x1_0_ntn_align8_warpspecialized_cooperative_epi_nosmem
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_128x256x64_1x2x1_0_ntn_align8_warpspecialized_cooperative_epi_tma
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_128x256x64_1x2x1_0_ntn_align8_stream_k_warpspecialized_cooperative_epi_nosmem
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_128x256x64_1x2x1_0_ntn_align8_stream_k_warpspecialized_cooperative_epi_tma
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_128x256x64_1x2x1_0_tnn_align8_warpspecialized_pingpong_epi_tma
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_128x256x64_1x2x1_0_tnn_align8_warpspecialized_pingpong_epi_nosmem
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_128x256x64_1x2x1_0_tnn_align8_warpspecialized_cooperative_epi_nosmem
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_128x256x64_1x2x1_0_tnn_align8_warpspecialized_cooperative_epi_tma
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_128x256x64_1x2x1_0_tnn_align8_stream_k_warpspecialized_cooperative_epi_nosmem
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_128x256x64_1x2x1_0_tnn_align8_stream_k_warpspecialized_cooperative_epi_tma
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_128x256x64_1x2x1_0_ttn_align8_warpspecialized_pingpong_epi_tma
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_128x256x64_1x2x1_0_ttn_align8_warpspecialized_pingpong_epi_nosmem
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_128x256x64_1x2x1_0_ttn_align8_warpspecialized_cooperative_epi_nosmem
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_128x256x64_1x2x1_0_ttn_align8_warpspecialized_cooperative_epi_tma
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_128x256x64_1x2x1_0_ttn_align8_stream_k_warpspecialized_cooperative_epi_nosmem
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_128x256x64_1x2x1_0_ttn_align8_stream_k_warpspecialized_cooperative_epi_tma
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_128x256x64_2x1x1_0_nnn_align8_warpspecialized_pingpong_epi_tma
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_128x256x64_2x1x1_0_nnn_align8_warpspecialized_pingpong_epi_nosmem
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_128x256x64_2x1x1_0_nnn_align8_warpspecialized_cooperative_epi_nosmem
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_128x256x64_2x1x1_0_nnn_align8_warpspecialized_cooperative_epi_tma
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_128x256x64_2x1x1_0_nnn_align8_stream_k_warpspecialized_cooperative_epi_nosmem
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_128x256x64_2x1x1_0_nnn_align8_stream_k_warpspecialized_cooperative_epi_tma
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_128x256x64_2x1x1_0_ntn_align8_warpspecialized_pingpong_epi_tma
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_128x256x64_2x1x1_0_ntn_align8_warpspecialized_pingpong_epi_nosmem
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_128x256x64_2x1x1_0_ntn_align8_warpspecialized_cooperative_epi_nosmem
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_128x256x64_2x1x1_0_ntn_align8_warpspecialized_cooperative_epi_tma
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_128x256x64_2x1x1_0_ntn_align8_stream_k_warpspecialized_cooperative_epi_nosmem
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_128x256x64_2x1x1_0_ntn_align8_stream_k_warpspecialized_cooperative_epi_tma
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_128x256x64_2x1x1_0_tnn_align8_warpspecialized_pingpong_epi_tma
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_128x256x64_2x1x1_0_tnn_align8_warpspecialized_pingpong_epi_nosmem
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_128x256x64_2x1x1_0_tnn_align8_warpspecialized_cooperative_epi_nosmem
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_128x256x64_2x1x1_0_tnn_align8_warpspecialized_cooperative_epi_tma
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_128x256x64_2x1x1_0_tnn_align8_stream_k_warpspecialized_cooperative_epi_nosmem
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_128x256x64_2x1x1_0_tnn_align8_stream_k_warpspecialized_cooperative_epi_tma
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_128x256x64_2x1x1_0_ttn_align8_warpspecialized_pingpong_epi_tma
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_128x256x64_2x1x1_0_ttn_align8_warpspecialized_pingpong_epi_nosmem
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_128x256x64_2x1x1_0_ttn_align8_warpspecialized_cooperative_epi_nosmem
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_128x256x64_2x1x1_0_ttn_align8_warpspecialized_cooperative_epi_tma
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_128x256x64_2x1x1_0_ttn_align8_stream_k_warpspecialized_cooperative_epi_nosmem
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_128x256x64_2x1x1_0_ttn_align8_stream_k_warpspecialized_cooperative_epi_tma
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_128x128x64_1x2x1_0_nnn_align8_warpspecialized_pingpong_epi_tma
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_128x128x64_1x2x1_0_nnn_align8_warpspecialized_pingpong_epi_nosmem
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_128x128x64_1x2x1_0_nnn_align8_warpspecialized_cooperative_epi_nosmem
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_128x128x64_1x2x1_0_nnn_align8_warpspecialized_cooperative_epi_tma
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_128x128x64_1x2x1_0_nnn_align8_stream_k_warpspecialized_cooperative_epi_nosmem
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_128x128x64_1x2x1_0_nnn_align8_stream_k_warpspecialized_cooperative_epi_tma
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_128x128x64_1x2x1_0_ntn_align8_warpspecialized_pingpong_epi_tma
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_128x128x64_1x2x1_0_ntn_align8_warpspecialized_pingpong_epi_nosmem
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_128x128x64_1x2x1_0_ntn_align8_warpspecialized_cooperative_epi_nosmem
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_128x128x64_1x2x1_0_ntn_align8_warpspecialized_cooperative_epi_tma
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_128x128x64_1x2x1_0_ntn_align8_stream_k_warpspecialized_cooperative_epi_nosmem
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_128x128x64_1x2x1_0_ntn_align8_stream_k_warpspecialized_cooperative_epi_tma
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_128x128x64_1x2x1_0_tnn_align8_warpspecialized_pingpong_epi_tma
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_128x128x64_1x2x1_0_tnn_align8_warpspecialized_pingpong_epi_nosmem
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_128x128x64_1x2x1_0_tnn_align8_warpspecialized_cooperative_epi_nosmem
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_128x128x64_1x2x1_0_tnn_align8_warpspecialized_cooperative_epi_tma
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_128x128x64_1x2x1_0_tnn_align8_stream_k_warpspecialized_cooperative_epi_nosmem
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_128x128x64_1x2x1_0_tnn_align8_stream_k_warpspecialized_cooperative_epi_tma
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_128x128x64_1x2x1_0_ttn_align8_warpspecialized_pingpong_epi_tma
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_128x128x64_1x2x1_0_ttn_align8_warpspecialized_pingpong_epi_nosmem
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_128x128x64_1x2x1_0_ttn_align8_warpspecialized_cooperative_epi_nosmem
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_128x128x64_1x2x1_0_ttn_align8_warpspecialized_cooperative_epi_tma
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_128x128x64_1x2x1_0_ttn_align8_stream_k_warpspecialized_cooperative_epi_nosmem
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_128x128x64_1x2x1_0_ttn_align8_stream_k_warpspecialized_cooperative_epi_tma
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_256x128x64_2x1x1_0_nnn_align8_warpspecialized_pingpong_epi_tma
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_256x128x64_2x1x1_0_nnn_align8_warpspecialized_pingpong_epi_nosmem
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_256x128x64_2x1x1_0_nnn_align8_warpspecialized_cooperative_epi_nosmem
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_256x128x64_2x1x1_0_nnn_align8_warpspecialized_cooperative_epi_tma
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_256x128x64_2x1x1_0_nnn_align8_stream_k_warpspecialized_cooperative_epi_nosmem
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_256x128x64_2x1x1_0_nnn_align8_stream_k_warpspecialized_cooperative_epi_tma
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_256x128x64_2x1x1_0_ntn_align8_warpspecialized_pingpong_epi_tma
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_256x128x64_2x1x1_0_ntn_align8_warpspecialized_pingpong_epi_nosmem
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_256x128x64_2x1x1_0_ntn_align8_warpspecialized_cooperative_epi_nosmem
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_256x128x64_2x1x1_0_ntn_align8_warpspecialized_cooperative_epi_tma
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_256x128x64_2x1x1_0_ntn_align8_stream_k_warpspecialized_cooperative_epi_nosmem
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_256x128x64_2x1x1_0_ntn_align8_stream_k_warpspecialized_cooperative_epi_tma
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_256x128x64_2x1x1_0_tnn_align8_warpspecialized_pingpong_epi_tma
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_256x128x64_2x1x1_0_tnn_align8_warpspecialized_pingpong_epi_nosmem
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_256x128x64_2x1x1_0_tnn_align8_warpspecialized_cooperative_epi_nosmem
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_256x128x64_2x1x1_0_tnn_align8_warpspecialized_cooperative_epi_tma
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_256x128x64_2x1x1_0_tnn_align8_stream_k_warpspecialized_cooperative_epi_nosmem
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_256x128x64_2x1x1_0_tnn_align8_stream_k_warpspecialized_cooperative_epi_tma
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_256x128x64_2x1x1_0_ttn_align8_warpspecialized_pingpong_epi_tma
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_256x128x64_2x1x1_0_ttn_align8_warpspecialized_pingpong_epi_nosmem
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_256x128x64_2x1x1_0_ttn_align8_warpspecialized_cooperative_epi_nosmem
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_256x128x64_2x1x1_0_ttn_align8_warpspecialized_cooperative_epi_tma
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_256x128x64_2x1x1_0_ttn_align8_stream_k_warpspecialized_cooperative_epi_nosmem
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_256x128x64_2x1x1_0_ttn_align8_stream_k_warpspecialized_cooperative_epi_tma
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_128x128x64_2x1x1_0_nnn_align8_warpspecialized_pingpong_epi_tma
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_128x128x64_2x1x1_0_nnn_align8_warpspecialized_pingpong_epi_nosmem
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_128x128x64_2x1x1_0_nnn_align8_warpspecialized_cooperative_epi_nosmem
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_128x128x64_2x1x1_0_nnn_align8_warpspecialized_cooperative_epi_tma
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_128x128x64_2x1x1_0_nnn_align8_stream_k_warpspecialized_cooperative_epi_nosmem
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_128x128x64_2x1x1_0_nnn_align8_stream_k_warpspecialized_cooperative_epi_tma
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_128x128x64_2x1x1_0_ntn_align8_warpspecialized_pingpong_epi_tma
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_128x128x64_2x1x1_0_ntn_align8_warpspecialized_pingpong_epi_nosmem
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_128x128x64_2x1x1_0_ntn_align8_warpspecialized_cooperative_epi_nosmem
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_128x128x64_2x1x1_0_ntn_align8_warpspecialized_cooperative_epi_tma
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_128x128x64_2x1x1_0_ntn_align8_stream_k_warpspecialized_cooperative_epi_nosmem
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_128x128x64_2x1x1_0_ntn_align8_stream_k_warpspecialized_cooperative_epi_tma
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_128x128x64_2x1x1_0_tnn_align8_warpspecialized_pingpong_epi_tma
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_128x128x64_2x1x1_0_tnn_align8_warpspecialized_pingpong_epi_nosmem
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_128x128x64_2x1x1_0_tnn_align8_warpspecialized_cooperative_epi_nosmem
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_128x128x64_2x1x1_0_tnn_align8_warpspecialized_cooperative_epi_tma
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_128x128x64_2x1x1_0_tnn_align8_stream_k_warpspecialized_cooperative_epi_nosmem
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_128x128x64_2x1x1_0_tnn_align8_stream_k_warpspecialized_cooperative_epi_tma
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_128x128x64_2x1x1_0_ttn_align8_warpspecialized_pingpong_epi_tma
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_128x128x64_2x1x1_0_ttn_align8_warpspecialized_pingpong_epi_nosmem
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_128x128x64_2x1x1_0_ttn_align8_warpspecialized_cooperative_epi_nosmem
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_128x128x64_2x1x1_0_ttn_align8_warpspecialized_cooperative_epi_tma
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_128x128x64_2x1x1_0_ttn_align8_stream_k_warpspecialized_cooperative_epi_nosmem
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_128x128x64_2x1x1_0_ttn_align8_stream_k_warpspecialized_cooperative_epi_tma
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_256x128x64_1x2x1_0_nnn_align8_warpspecialized_pingpong_epi_tma
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_256x128x64_1x2x1_0_nnn_align8_warpspecialized_pingpong_epi_nosmem
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_256x128x64_1x2x1_0_nnn_align8_warpspecialized_cooperative_epi_nosmem
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_256x128x64_1x2x1_0_nnn_align8_warpspecialized_cooperative_epi_tma
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_256x128x64_1x2x1_0_nnn_align8_stream_k_warpspecialized_cooperative_epi_nosmem
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_256x128x64_1x2x1_0_nnn_align8_stream_k_warpspecialized_cooperative_epi_tma
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_256x128x64_1x2x1_0_ntn_align8_warpspecialized_pingpong_epi_tma
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_256x128x64_1x2x1_0_ntn_align8_warpspecialized_pingpong_epi_nosmem
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_256x128x64_1x2x1_0_ntn_align8_warpspecialized_cooperative_epi_nosmem
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_256x128x64_1x2x1_0_ntn_align8_warpspecialized_cooperative_epi_tma
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_256x128x64_1x2x1_0_ntn_align8_stream_k_warpspecialized_cooperative_epi_nosmem
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_256x128x64_1x2x1_0_ntn_align8_stream_k_warpspecialized_cooperative_epi_tma
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_256x128x64_1x2x1_0_tnn_align8_warpspecialized_pingpong_epi_tma
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_256x128x64_1x2x1_0_tnn_align8_warpspecialized_pingpong_epi_nosmem
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_256x128x64_1x2x1_0_tnn_align8_warpspecialized_cooperative_epi_nosmem
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_256x128x64_1x2x1_0_tnn_align8_warpspecialized_cooperative_epi_tma
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_256x128x64_1x2x1_0_tnn_align8_stream_k_warpspecialized_cooperative_epi_nosmem
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_256x128x64_1x2x1_0_tnn_align8_stream_k_warpspecialized_cooperative_epi_tma
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_256x128x64_1x2x1_0_ttn_align8_warpspecialized_pingpong_epi_tma
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_256x128x64_1x2x1_0_ttn_align8_warpspecialized_pingpong_epi_nosmem
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_256x128x64_1x2x1_0_ttn_align8_warpspecialized_cooperative_epi_nosmem
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_256x128x64_1x2x1_0_ttn_align8_warpspecialized_cooperative_epi_tma
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_256x128x64_1x2x1_0_ttn_align8_stream_k_warpspecialized_cooperative_epi_nosmem
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_256x128x64_1x2x1_0_ttn_align8_stream_k_warpspecialized_cooperative_epi_tma
description_.C.element void