cutlass icon indicating copy to clipboard operation
cutlass copied to clipboard

CUDA Templates for Linear Algebra Subroutines

Results 608 cutlass issues
Sort by recently updated
recently updated
newest added

Hi! I am using windows....And I have heard cutlass can be used on windows. Could you write some document to guide us? Thank you!!!

documentation
inactive-30d

**Describe the bug** When using cutlass_profiler to profile batched GEMM operations, the command prints lots of code info and gives weird outputs. **Steps/Code to reproduce bug** 1. build cutlass_profiler 2....

bug
inactive-30d

**Is your feature request related to a problem? Please describe.** recently, I try to compile cutlass with cuda10 and nvcc11.6, because nvcc is not bind with cuda lib. If use...

feature request
inactive-30d

There are about 20% performance difference between cutlass profiler‘s GemmUniversal kernel and my Gemm kernel (they look like same kernel). **GPU: T4, persistent mode: ON, locked on 1590MHz** NVCC: 11.1...

question
inactive-30d

I want to implement a convolution of two 16bit integer inputs. I think this can be split up into 4 convolutions of 8bit inputs, along with some bit shifts /...

question
inactive-30d

**What is your question?** I'd like to understand if we could recognize which tensor core instruction and what's the data types for a given cutlass kernel. For example, for kernel...

question
inactive-30d

Hello, I'm reading the scripts in cutlass/tools/library and find it hard to understand why `TileDescription` gets its name in this format: https://github.com/NVIDIA/cutlass/blob/master/tools/library/scripts/library.py#L528 Why don't we use ``` python "{}x{}x{}_{}x{}x{}_{}".format(tbm, tbn,...

feature request
inactive-30d
inactive-90d

This commit set add singlestage conv for cutlass

inactive-30d
inactive-90d

**Is your feature request related to a problem? Please describe.** Recently more details about Nvidia's latest H100 GPU are released in https://developer.nvidia.com/blog/nvidia-hopper-architecture-in-depth/ . Tensor Core will support FP8 E4M3 and...

feature request
inactive-30d
inactive-90d

Hi, this commit adds `warp_count` to the `procedural_name` of `TileDescription`. Please review, thanks : )

inactive-30d
inactive-90d