cutlass icon indicating copy to clipboard operation
cutlass copied to clipboard

[BUG] ElementC=void kernel reads non-void in `GemmDescription`

Open manishucsd opened this issue 1 year ago • 3 comments

I am observing gemm_desc.C.element = bf16, when I set it as void.

Please use the following debug_branch

Please check if the below print is expected. description_.C.element bf16 // Is this expected?

We are choosing incorrect kernel because of this, is this a bug or I am messing something up?

./tools/profiler/cutlass_profiler --kernels=cutlass3x_sm90_tensorop_s64x128x32gemm_e4m3_e4m3_f32_void_bf16_128x128x128_1x2x1_0_tnn_align16_warpspecialized_epi_nosmem
description_.C.element bf16       // Is this expected?
description_.C.element bf16
description_.C.element bf16
description_.C.element bf16
description_.C.element bf16
description_.C.element bf16
description_.C.element bf16
description_.C.element bf16



=============================
  Problem ID: 1

        Provider: CUTLASS
   OperationKind: gemm
       Operation: cutlass3x_sm90_tensorop_s64x128x32gemm_e4m3_e4m3_f32_void_bf16_128x128x128_1x2x1_0_tnn_align16_warpspecialized_epi_nosmem

          Status: Success
    Verification: ON
     Disposition: Passed

reference_device: Passed
          cuBLAS: Not run
           cuDNN: Not run

       Arguments: --gemm_kind=universal --m=1024 --n=1024 --k=1024 --A=fe4m3:row --B=fe4m3:column --C=bf16:column --D=bf16:column  \
                  --alpha=1 --beta=0 --split_k_mode=serial --split_k_slices=1 --batch_count=1 --raster_order=heuristic  \
                  --op_class=tensorop --accum=f32 --cta_m=128 --cta_n=128 --cta_k=128 --cluster_m=1 --cluster_n=2 --cluster_k=1  \
                  --stages=7 --warps_m=4 --warps_n=1 --warps_k=1 --inst_m=64 --inst_n=128 --inst_k=32 --min_cc=90 --max_cc=90  \
                 

           Bytes: 4194304  bytes
           FLOPs: 2149580800  flops
           FLOPs/Byte: 512

         Runtime: 0.0113731  ms
          Memory: 343.463 GiB/s

            Math: 189005 GFLOP/s


=============================

CSV Results:

manishucsd avatar Jul 15 '24 03:07 manishucsd

@thakkarV , @IonThruster , @hwu36 , @mnicely

manishucsd avatar Jul 16 '24 18:07 manishucsd

@mnicely can we commit this for 3.6 please?

thakkarV avatar Aug 02 '24 13:08 thakkarV

Yes.

mnicely avatar Aug 02 '24 14:08 mnicely

This issue has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d if there is no activity in the next 60 days.

github-actions[bot] avatar Sep 01 '24 15:09 github-actions[bot]

This issue has been labeled inactive-90d due to no recent activity in the past 90 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed.

github-actions[bot] avatar Nov 30 '24 15:11 github-actions[bot]

@manishucsd is this still an issue? TOT currently doesn't emit voidC with non-TMA epilogues since the primary purpose of voidC is to conserve shared memory for the mainloop

richardmcai avatar Jan 16 '25 03:01 richardmcai

Only partially fixed. I think. Some of the ElementC=void templates still shows bf16. It is not just about the prints, we build library out of cutlass kernels and select a kernel at runtime based on these enums. So it is important for us that GemmDescription object correctly reflect CUTLASS GEMM kernel compile-time arguments at the runtime. I have workaround for now, but request if we can fix it? More broadly have some tests to verify that GemmDescription is correctly reflecting kernel's compile-time arguments.

description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_128x256x64_1x2x1_0_tnn_align8_warpspecialized_pingpong_epi_tma
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_128x256x64_1x2x1_0_tnn_align8_warpspecialized_pingpong_epi_nosmem
description_.C.element bf16
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_128x256x64_1x2x1_0_tnn_align8_warpspecialized_cooperative_epi_nosmem
description_.C.element bf16
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_128x256x64_1x2x1_0_tnn_align8_warpspecialized_cooperative_epi_tma
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_128x256x64_1x2x1_0_tnn_align8_stream_k_warpspecialized_cooperative_epi_nosmem
description_.C.element bf16
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_128x256x64_1x2x1_0_tnn_align8_stream_k_warpspecialized_cooperative_epi_tma
description_.C.element void

manishucsd avatar Jan 16 '25 18:01 manishucsd

@manishucsd I ran the latest cutlass code and the description_.C.element of above kernels are all void. Could you verify again?

Junkai-Wu avatar Jan 26 '25 05:01 Junkai-Wu

Thanks @Junkai-Wu . I am unable to compile CUTLASS 3.8 at the moment https://github.com/NVIDIA/cutlass/issues/2064.

manishucsd avatar Jan 26 '25 06:01 manishucsd

I could compile and verify with CUDA 12.8. I see compile-time ElementC=void now correctly reflected at runtime in the GemmDescription object. We can close this bug. I have more requests on GemmDescription object, which I will describe in a different issue. We can close this one. Thank you for fixing it.

description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_128x256x64_1x2x1_0_nnn_align8_warpspecialized_pingpong_epi_tma
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_128x256x64_1x2x1_0_nnn_align8_warpspecialized_pingpong_epi_nosmem
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_128x256x64_1x2x1_0_nnn_align8_warpspecialized_cooperative_epi_nosmem
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_128x256x64_1x2x1_0_nnn_align8_warpspecialized_cooperative_epi_tma
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_128x256x64_1x2x1_0_nnn_align8_stream_k_warpspecialized_cooperative_epi_nosmem
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_128x256x64_1x2x1_0_nnn_align8_stream_k_warpspecialized_cooperative_epi_tma
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_128x256x64_1x2x1_0_ntn_align8_warpspecialized_pingpong_epi_tma
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_128x256x64_1x2x1_0_ntn_align8_warpspecialized_pingpong_epi_nosmem
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_128x256x64_1x2x1_0_ntn_align8_warpspecialized_cooperative_epi_nosmem
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_128x256x64_1x2x1_0_ntn_align8_warpspecialized_cooperative_epi_tma
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_128x256x64_1x2x1_0_ntn_align8_stream_k_warpspecialized_cooperative_epi_nosmem
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_128x256x64_1x2x1_0_ntn_align8_stream_k_warpspecialized_cooperative_epi_tma
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_128x256x64_1x2x1_0_tnn_align8_warpspecialized_pingpong_epi_tma
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_128x256x64_1x2x1_0_tnn_align8_warpspecialized_pingpong_epi_nosmem
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_128x256x64_1x2x1_0_tnn_align8_warpspecialized_cooperative_epi_nosmem
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_128x256x64_1x2x1_0_tnn_align8_warpspecialized_cooperative_epi_tma
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_128x256x64_1x2x1_0_tnn_align8_stream_k_warpspecialized_cooperative_epi_nosmem
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_128x256x64_1x2x1_0_tnn_align8_stream_k_warpspecialized_cooperative_epi_tma
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_128x256x64_1x2x1_0_ttn_align8_warpspecialized_pingpong_epi_tma
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_128x256x64_1x2x1_0_ttn_align8_warpspecialized_pingpong_epi_nosmem
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_128x256x64_1x2x1_0_ttn_align8_warpspecialized_cooperative_epi_nosmem
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_128x256x64_1x2x1_0_ttn_align8_warpspecialized_cooperative_epi_tma
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_128x256x64_1x2x1_0_ttn_align8_stream_k_warpspecialized_cooperative_epi_nosmem
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_128x256x64_1x2x1_0_ttn_align8_stream_k_warpspecialized_cooperative_epi_tma
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_128x256x64_2x1x1_0_nnn_align8_warpspecialized_pingpong_epi_tma
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_128x256x64_2x1x1_0_nnn_align8_warpspecialized_pingpong_epi_nosmem
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_128x256x64_2x1x1_0_nnn_align8_warpspecialized_cooperative_epi_nosmem
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_128x256x64_2x1x1_0_nnn_align8_warpspecialized_cooperative_epi_tma
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_128x256x64_2x1x1_0_nnn_align8_stream_k_warpspecialized_cooperative_epi_nosmem
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_128x256x64_2x1x1_0_nnn_align8_stream_k_warpspecialized_cooperative_epi_tma
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_128x256x64_2x1x1_0_ntn_align8_warpspecialized_pingpong_epi_tma
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_128x256x64_2x1x1_0_ntn_align8_warpspecialized_pingpong_epi_nosmem
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_128x256x64_2x1x1_0_ntn_align8_warpspecialized_cooperative_epi_nosmem
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_128x256x64_2x1x1_0_ntn_align8_warpspecialized_cooperative_epi_tma
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_128x256x64_2x1x1_0_ntn_align8_stream_k_warpspecialized_cooperative_epi_nosmem
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_128x256x64_2x1x1_0_ntn_align8_stream_k_warpspecialized_cooperative_epi_tma
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_128x256x64_2x1x1_0_tnn_align8_warpspecialized_pingpong_epi_tma
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_128x256x64_2x1x1_0_tnn_align8_warpspecialized_pingpong_epi_nosmem
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_128x256x64_2x1x1_0_tnn_align8_warpspecialized_cooperative_epi_nosmem
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_128x256x64_2x1x1_0_tnn_align8_warpspecialized_cooperative_epi_tma
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_128x256x64_2x1x1_0_tnn_align8_stream_k_warpspecialized_cooperative_epi_nosmem
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_128x256x64_2x1x1_0_tnn_align8_stream_k_warpspecialized_cooperative_epi_tma
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_128x256x64_2x1x1_0_ttn_align8_warpspecialized_pingpong_epi_tma
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_128x256x64_2x1x1_0_ttn_align8_warpspecialized_pingpong_epi_nosmem
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_128x256x64_2x1x1_0_ttn_align8_warpspecialized_cooperative_epi_nosmem
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_128x256x64_2x1x1_0_ttn_align8_warpspecialized_cooperative_epi_tma
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_128x256x64_2x1x1_0_ttn_align8_stream_k_warpspecialized_cooperative_epi_nosmem
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_128x256x64_2x1x1_0_ttn_align8_stream_k_warpspecialized_cooperative_epi_tma
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_128x128x64_1x2x1_0_nnn_align8_warpspecialized_pingpong_epi_tma
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_128x128x64_1x2x1_0_nnn_align8_warpspecialized_pingpong_epi_nosmem
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_128x128x64_1x2x1_0_nnn_align8_warpspecialized_cooperative_epi_nosmem
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_128x128x64_1x2x1_0_nnn_align8_warpspecialized_cooperative_epi_tma
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_128x128x64_1x2x1_0_nnn_align8_stream_k_warpspecialized_cooperative_epi_nosmem
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_128x128x64_1x2x1_0_nnn_align8_stream_k_warpspecialized_cooperative_epi_tma
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_128x128x64_1x2x1_0_ntn_align8_warpspecialized_pingpong_epi_tma
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_128x128x64_1x2x1_0_ntn_align8_warpspecialized_pingpong_epi_nosmem
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_128x128x64_1x2x1_0_ntn_align8_warpspecialized_cooperative_epi_nosmem
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_128x128x64_1x2x1_0_ntn_align8_warpspecialized_cooperative_epi_tma
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_128x128x64_1x2x1_0_ntn_align8_stream_k_warpspecialized_cooperative_epi_nosmem
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_128x128x64_1x2x1_0_ntn_align8_stream_k_warpspecialized_cooperative_epi_tma
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_128x128x64_1x2x1_0_tnn_align8_warpspecialized_pingpong_epi_tma
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_128x128x64_1x2x1_0_tnn_align8_warpspecialized_pingpong_epi_nosmem
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_128x128x64_1x2x1_0_tnn_align8_warpspecialized_cooperative_epi_nosmem
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_128x128x64_1x2x1_0_tnn_align8_warpspecialized_cooperative_epi_tma
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_128x128x64_1x2x1_0_tnn_align8_stream_k_warpspecialized_cooperative_epi_nosmem
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_128x128x64_1x2x1_0_tnn_align8_stream_k_warpspecialized_cooperative_epi_tma
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_128x128x64_1x2x1_0_ttn_align8_warpspecialized_pingpong_epi_tma
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_128x128x64_1x2x1_0_ttn_align8_warpspecialized_pingpong_epi_nosmem
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_128x128x64_1x2x1_0_ttn_align8_warpspecialized_cooperative_epi_nosmem
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_128x128x64_1x2x1_0_ttn_align8_warpspecialized_cooperative_epi_tma
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_128x128x64_1x2x1_0_ttn_align8_stream_k_warpspecialized_cooperative_epi_nosmem
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_128x128x64_1x2x1_0_ttn_align8_stream_k_warpspecialized_cooperative_epi_tma
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_256x128x64_2x1x1_0_nnn_align8_warpspecialized_pingpong_epi_tma
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_256x128x64_2x1x1_0_nnn_align8_warpspecialized_pingpong_epi_nosmem
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_256x128x64_2x1x1_0_nnn_align8_warpspecialized_cooperative_epi_nosmem
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_256x128x64_2x1x1_0_nnn_align8_warpspecialized_cooperative_epi_tma
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_256x128x64_2x1x1_0_nnn_align8_stream_k_warpspecialized_cooperative_epi_nosmem
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_256x128x64_2x1x1_0_nnn_align8_stream_k_warpspecialized_cooperative_epi_tma
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_256x128x64_2x1x1_0_ntn_align8_warpspecialized_pingpong_epi_tma
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_256x128x64_2x1x1_0_ntn_align8_warpspecialized_pingpong_epi_nosmem
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_256x128x64_2x1x1_0_ntn_align8_warpspecialized_cooperative_epi_nosmem
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_256x128x64_2x1x1_0_ntn_align8_warpspecialized_cooperative_epi_tma
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_256x128x64_2x1x1_0_ntn_align8_stream_k_warpspecialized_cooperative_epi_nosmem
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_256x128x64_2x1x1_0_ntn_align8_stream_k_warpspecialized_cooperative_epi_tma
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_256x128x64_2x1x1_0_tnn_align8_warpspecialized_pingpong_epi_tma
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_256x128x64_2x1x1_0_tnn_align8_warpspecialized_pingpong_epi_nosmem
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_256x128x64_2x1x1_0_tnn_align8_warpspecialized_cooperative_epi_nosmem
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_256x128x64_2x1x1_0_tnn_align8_warpspecialized_cooperative_epi_tma
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_256x128x64_2x1x1_0_tnn_align8_stream_k_warpspecialized_cooperative_epi_nosmem
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_256x128x64_2x1x1_0_tnn_align8_stream_k_warpspecialized_cooperative_epi_tma
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_256x128x64_2x1x1_0_ttn_align8_warpspecialized_pingpong_epi_tma
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_256x128x64_2x1x1_0_ttn_align8_warpspecialized_pingpong_epi_nosmem
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_256x128x64_2x1x1_0_ttn_align8_warpspecialized_cooperative_epi_nosmem
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_256x128x64_2x1x1_0_ttn_align8_warpspecialized_cooperative_epi_tma
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_256x128x64_2x1x1_0_ttn_align8_stream_k_warpspecialized_cooperative_epi_nosmem
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_256x128x64_2x1x1_0_ttn_align8_stream_k_warpspecialized_cooperative_epi_tma
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_128x128x64_2x1x1_0_nnn_align8_warpspecialized_pingpong_epi_tma
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_128x128x64_2x1x1_0_nnn_align8_warpspecialized_pingpong_epi_nosmem
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_128x128x64_2x1x1_0_nnn_align8_warpspecialized_cooperative_epi_nosmem
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_128x128x64_2x1x1_0_nnn_align8_warpspecialized_cooperative_epi_tma
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_128x128x64_2x1x1_0_nnn_align8_stream_k_warpspecialized_cooperative_epi_nosmem
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_128x128x64_2x1x1_0_nnn_align8_stream_k_warpspecialized_cooperative_epi_tma
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_128x128x64_2x1x1_0_ntn_align8_warpspecialized_pingpong_epi_tma
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_128x128x64_2x1x1_0_ntn_align8_warpspecialized_pingpong_epi_nosmem
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_128x128x64_2x1x1_0_ntn_align8_warpspecialized_cooperative_epi_nosmem
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_128x128x64_2x1x1_0_ntn_align8_warpspecialized_cooperative_epi_tma
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_128x128x64_2x1x1_0_ntn_align8_stream_k_warpspecialized_cooperative_epi_nosmem
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_128x128x64_2x1x1_0_ntn_align8_stream_k_warpspecialized_cooperative_epi_tma
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_128x128x64_2x1x1_0_tnn_align8_warpspecialized_pingpong_epi_tma
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_128x128x64_2x1x1_0_tnn_align8_warpspecialized_pingpong_epi_nosmem
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_128x128x64_2x1x1_0_tnn_align8_warpspecialized_cooperative_epi_nosmem
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_128x128x64_2x1x1_0_tnn_align8_warpspecialized_cooperative_epi_tma
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_128x128x64_2x1x1_0_tnn_align8_stream_k_warpspecialized_cooperative_epi_nosmem
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_128x128x64_2x1x1_0_tnn_align8_stream_k_warpspecialized_cooperative_epi_tma
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_128x128x64_2x1x1_0_ttn_align8_warpspecialized_pingpong_epi_tma
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_128x128x64_2x1x1_0_ttn_align8_warpspecialized_pingpong_epi_nosmem
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_128x128x64_2x1x1_0_ttn_align8_warpspecialized_cooperative_epi_nosmem
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_128x128x64_2x1x1_0_ttn_align8_warpspecialized_cooperative_epi_tma
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_128x128x64_2x1x1_0_ttn_align8_stream_k_warpspecialized_cooperative_epi_nosmem
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_128x128x64_2x1x1_0_ttn_align8_stream_k_warpspecialized_cooperative_epi_tma
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_256x128x64_1x2x1_0_nnn_align8_warpspecialized_pingpong_epi_tma
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_256x128x64_1x2x1_0_nnn_align8_warpspecialized_pingpong_epi_nosmem
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_256x128x64_1x2x1_0_nnn_align8_warpspecialized_cooperative_epi_nosmem
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_256x128x64_1x2x1_0_nnn_align8_warpspecialized_cooperative_epi_tma
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_256x128x64_1x2x1_0_nnn_align8_stream_k_warpspecialized_cooperative_epi_nosmem
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_256x128x64_1x2x1_0_nnn_align8_stream_k_warpspecialized_cooperative_epi_tma
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_256x128x64_1x2x1_0_ntn_align8_warpspecialized_pingpong_epi_tma
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_256x128x64_1x2x1_0_ntn_align8_warpspecialized_pingpong_epi_nosmem
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_256x128x64_1x2x1_0_ntn_align8_warpspecialized_cooperative_epi_nosmem
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_256x128x64_1x2x1_0_ntn_align8_warpspecialized_cooperative_epi_tma
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_256x128x64_1x2x1_0_ntn_align8_stream_k_warpspecialized_cooperative_epi_nosmem
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_256x128x64_1x2x1_0_ntn_align8_stream_k_warpspecialized_cooperative_epi_tma
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_256x128x64_1x2x1_0_tnn_align8_warpspecialized_pingpong_epi_tma
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_256x128x64_1x2x1_0_tnn_align8_warpspecialized_pingpong_epi_nosmem
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_256x128x64_1x2x1_0_tnn_align8_warpspecialized_cooperative_epi_nosmem
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_256x128x64_1x2x1_0_tnn_align8_warpspecialized_cooperative_epi_tma
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_256x128x64_1x2x1_0_tnn_align8_stream_k_warpspecialized_cooperative_epi_nosmem
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_256x128x64_1x2x1_0_tnn_align8_stream_k_warpspecialized_cooperative_epi_tma
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_256x128x64_1x2x1_0_ttn_align8_warpspecialized_pingpong_epi_tma
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_256x128x64_1x2x1_0_ttn_align8_warpspecialized_pingpong_epi_nosmem
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_256x128x64_1x2x1_0_ttn_align8_warpspecialized_cooperative_epi_nosmem
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_256x128x64_1x2x1_0_ttn_align8_warpspecialized_cooperative_epi_tma
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_256x128x64_1x2x1_0_ttn_align8_stream_k_warpspecialized_cooperative_epi_nosmem
description_.C.element void
description_.name cutlass3x_sm90_tensorop_s64x128x16gemm_bf16_bf16_f32_void_bf16_256x128x64_1x2x1_0_ttn_align8_stream_k_warpspecialized_cooperative_epi_tma
description_.C.element void

manishucsd avatar Jan 26 '25 07:01 manishucsd