cutlass
cutlass copied to clipboard
CUDA Templates for Linear Algebra Subroutines
`./tools/profiler/cutlass_profiler --m=16 --n=16 --k=1024 --A=fe5m2:\* --B=fe5m2:\*` works for me just fine, or any other combination of fp8 types, layouts etc. I also noticed that your A type if fp16 but...
commit id: 757275f2796bb901575c633e2a32bc76ca84ffec device arch: hopper; change LayoutA to cutlass::layout::ColumnMajor; change LayoutB to cutlass::layout::RowMajor;  kernel will run RS kernel; profiling result:  register spill; change Tile to Shape; no...
**Describe the bug** `make_tiled_copy` also should not secretly pad `Thr` and `Val`. See code sample and discussion. **Steps/Code to reproduce bug** ```cpp #include using namespace cute; int main() { std::vector...
[QST] In markdown docs, the attached links used global path, it doesn't work well in all platforms
**For example** In README.md line 48, `(/media/docs/cute/00_quickstart.md)` I would like to suggest to use relative path: such as `(./media/docs/cute/00_quickstart.md)`, just add a little dot. This works in my local machine...
When running a standalone Cutlass GEMM with a generated SM90 EVT-based epilogue which loads two auxiliary inputs ( one broadcasted, one with full dimensionality), I get a CUDA error about...
### Bug description When running the provided code as a standalone executable, a CUDA illegal memory access is reported. Using compute-sanitizer, I could pinpoint this to an illegal shared memory...
**Describe the bug** Wrong traits for 64-bit integer **Steps/Code to reproduce bug** https://github.com/NVIDIA/cutlass/blob/main/tools/util/include/cutlass/util/type_traits.h#L118 **Expected behavior** N/A **Environment details (please complete the following information):** N/A **Additional context** I opened a PR:...
I see this from a blog, I am not sure whether it is correct, and I am also very interested in detailed make_tensor function usage. By the way, shared memory,...
Now I have tCrC, and I want to store them into shared memory. Can copy function do that? Thanks!