cutlass [DOC] Where does cutlass’ detailed GEMM kernel?

Hi! I am learning cutlass, and I see something like: (from official post)

/// CUTLASS SGEMM example
__global__ void gemm_kernel(void gemm_kernel(
    float *C, float *C, 
    float const *A, float const *A, 
    float const *B, float const *B, 
    int M, int M, 
    int N, int N, 
    int K) {int K) {

    // Define the GEMM tile sizes - discussed in next section// Define the GEMM tile sizes - discussed in next section
    typedef block_task_policy <typedef block_task_policy <
        128, // BlockItemsY: Height in rows of a tile128, // BlockItemsY: Height in rows of a tile
        32, // BlockItemsX - Width in columns of a tile32, // BlockItemsX - Width in columns of a tile
        8, // ThreadItemsY - Height in rows of a thread-tile8, // ThreadItemsY - Height in rows of a thread-tile
        4, // ThreadItemsX - Width in columns of a thread-tile4, // ThreadItemsX - Width in columns of a thread-tile
        8, // BlockItemsK - Depth of a tile8, // BlockItemsK - Depth of a tile
        true, // UseDoubleScratchTiles - whether to double-buffer SMEMtrue, // UseDoubleScratchTiles - whether to double-buffer SMEM
        block_raster_enum::Default // Block rasterization strategy::Default // Block rasterization strategy
    > block_task_policy_t;> block_task_policy_t;

    // Define the epilogue functor// Define the epilogue functor
    typedef gemm::blas_scaled_epilogue<float, float, float> epilogue_op_t ;typedef gemm::blas_scaled_epilogue<float, float, float> epilogue_op_t ;

    // Define the block_task type.// Define the block_task type.
    typedef block_task < typedef block_task < 
        block_task_policy_t, block_task_policy_t, 
        float, float, 
        float, float, 
        matrix_transform_t::NonTranspose, matrix_transform_t::NonTranspose, 
        4, 4, 
        matrix_transform_t::NonTranspose, matrix_transform_t::NonTranspose, 
        4, 4, 
        epilogue_op_t, epilogue_op_t, 
        4, 4, 
        true true 
    > block_task_t;> block_task_t;

    // Declare statically-allocated shared storage// Declare statically-allocated shared storage
    __shared__ block_task_t::scratch_storage_t smem;block_task_t::scratch_storage_t smem;

    // Construct and run the task// Construct and run the task
    block_task_t(block_task_t(
        reinterpret_cast(&smem),reinterpret_cast(&smem),
        &smem,&smem,
        A,,
        B,,
        C,,
        epilogue_op_t(1, 0),epilogue_op_t(1, 0),
        M,,
        N,,
        K).run();).run();
}}

To guide usage…of which can not see the base level implementation of the GEMM. I guess there should exist! But the github page of cutlass is kind of…messy…I tried hard myself!! But really can not find…

Could anyone kindly provide me a link? Thank you!!!

================

By the way, I see a “naive gemm” in cutlass github. Sorry, that is not what I want! Haha!

Jun 15 '22 03:06 Arsmart123

@Arsmart123, that blog is pretty outdated. To get started please look at our examples, specifically basic_gemm

Jun 15 '22 12:06 mnicely

There is a generic __global__ kernel used here. The type definition passed as its argument is unique for all the different operations.

All the kernel arguments are packed into a struct so the template function signature doesn't have to change as we add more members

Jun 19 '22 12:06 mnicely

This issue has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d if there is no activity in the next 60 days.

Jul 19 '22 13:07 github-actions[bot]