cutlass icon indicating copy to clipboard operation
cutlass copied to clipboard

CUDA Templates for Linear Algebra Subroutines

Results 608 cutlass issues
Sort by recently updated
recently updated
newest added

`./tools/profiler/cutlass_profiler --m=16 --n=16 --k=1024 --A=fe5m2:\* --B=fe5m2:\*` works for me just fine, or any other combination of fp8 types, layouts etc. I also noticed that your A type if fp16 but...

commit id: 757275f2796bb901575c633e2a32bc76ca84ffec device arch: hopper; change LayoutA to cutlass::layout::ColumnMajor; change LayoutB to cutlass::layout::RowMajor; ![image](https://github.com/NVIDIA/cutlass/assets/20987824/6e699c3a-d450-40b8-b405-04e567b60617) kernel will run RS kernel; profiling result: ![image](https://github.com/NVIDIA/cutlass/assets/20987824/05a797b2-c332-49c4-8c3c-818aa6140b6e) register spill; change Tile to Shape; no...

question
inactive-30d
inactive-90d

**Describe the bug** `make_tiled_copy` also should not secretly pad `Thr` and `Val`. See code sample and discussion. **Steps/Code to reproduce bug** ```cpp #include using namespace cute; int main() { std::vector...

bug
CuTe

**For example** In README.md line 48, `(/media/docs/cute/00_quickstart.md)` I would like to suggest to use relative path: such as `(./media/docs/cute/00_quickstart.md)`, just add a little dot. This works in my local machine...

question
? - Needs Triage
inactive-30d

When running a standalone Cutlass GEMM with a generated SM90 EVT-based epilogue which loads two auxiliary inputs ( one broadcasted, one with full dimensionality), I get a CUDA error about...

bug

### Bug description When running the provided code as a standalone executable, a CUDA illegal memory access is reported. Using compute-sanitizer, I could pinpoint this to an illegal shared memory...

bug
? - Needs Triage
inactive-30d
inactive-90d

**Describe the bug** Wrong traits for 64-bit integer **Steps/Code to reproduce bug** https://github.com/NVIDIA/cutlass/blob/main/tools/util/include/cutlass/util/type_traits.h#L118 **Expected behavior** N/A **Environment details (please complete the following information):** N/A **Additional context** I opened a PR:...

bug
inactive-30d
inactive-90d

I see this from a blog, I am not sure whether it is correct, and I am also very interested in detailed make_tensor function usage. By the way, shared memory,...

question
inactive-30d
inactive-90d
CuTe

Now I have tCrC, and I want to store them into shared memory. Can copy function do that? Thanks!

feature request
inactive-30d
CuTe