cutlass
cutlass copied to clipboard
CUDA Templates for Linear Algebra Subroutines
Hi! I am using windows....And I have heard cutlass can be used on windows. Could you write some document to guide us? Thank you!!!
**Describe the bug** When using cutlass_profiler to profile batched GEMM operations, the command prints lots of code info and gives weird outputs. **Steps/Code to reproduce bug** 1. build cutlass_profiler 2....
**Is your feature request related to a problem? Please describe.** recently, I try to compile cutlass with cuda10 and nvcc11.6, because nvcc is not bind with cuda lib. If use...
There are about 20% performance difference between cutlass profiler‘s GemmUniversal kernel and my Gemm kernel (they look like same kernel). **GPU: T4, persistent mode: ON, locked on 1590MHz** NVCC: 11.1...
I want to implement a convolution of two 16bit integer inputs. I think this can be split up into 4 convolutions of 8bit inputs, along with some bit shifts /...
**What is your question?** I'd like to understand if we could recognize which tensor core instruction and what's the data types for a given cutlass kernel. For example, for kernel...
Hello, I'm reading the scripts in cutlass/tools/library and find it hard to understand why `TileDescription` gets its name in this format: https://github.com/NVIDIA/cutlass/blob/master/tools/library/scripts/library.py#L528 Why don't we use ``` python "{}x{}x{}_{}x{}x{}_{}".format(tbm, tbn,...
**Is your feature request related to a problem? Please describe.** Recently more details about Nvidia's latest H100 GPU are released in https://developer.nvidia.com/blog/nvidia-hopper-architecture-in-depth/ . Tensor Core will support FP8 E4M3 and...
Hi, this commit adds `warp_count` to the `procedural_name` of `TileDescription`. Please review, thanks : )