Cutlass_EX
Cutlass_EX copied to clipboard
study of cutlass
Cutlass_EX
0. Introduction
- Goal : Development of a 4-bit primitives kernels by using Cutlass
1. Example List
example_1) custom code with CUTLASS
example_2) cutlass::uint4b_t
example_3) single-precision gemm template
- 00_basic_gemm
- This is kernel computes the general matrix product (GEMM) using single-precision floating-point arithmetic and assumes all matrices have column-major layout.
example_4) mixed-precision gemm template with cutlass utilities
- 01_cutlass_utilities
- These utilities are intended to be useful supporting components for managing tensor and matrix memory allocations, initializing and comparing results, and computing reference output.
example_5) CUTLASS debugging tool
- 02_dump_reg_shmem
- Demonstrate CUTLASS debugging tool for dumping fragments and shared memory
- dumping : Record the state of memory at a specific point in time
example_6) CUTLASS layout visualization example
example_7) CUTLASS example to compute a batched strided gemm in two different ways
- 05_batched_gemm
- strided batched gemm : By specifying pointers to the first matrices of the batch and the stride between the consecutive matrices of the batch.
- array gemm : By copying pointers to all matrices of the batch to the device memory.
example_8) CUTLASS turing gemm using tensor cores
example_9) CUTLASS turing convolution using tensor cores
example_10) CUTLASS ampere convolution using tensor cores
example_11) Handling Cutlass Tensors
example_12) Simple CUTLASS convolution using Tensor core
2. Guide
cd example_{number}
mkdir build
cd build
cmake ..
make
./main
3 Reference
- Cutlass : https://github.com/NVIDIA/cutlass