Results 5 issues of Charlie Durham

A SYRK example in cublasLt would be really useful. i.e. matmul(A, A transpose) One of the cublasLtMatmulAlgoCapAttributes_t is for uplo support and mentions SYRK. However I don't know how I...

cuBLASLt

Hello, I'm making some progress running some of these different layout. I copied a setup for SM80_16x8x8_F32TF32TF32F32_TN from the `default_gemm_configuration.hpp` file and I now have the following: ``` template <...

question
? - Needs Triage

I wrote out this permutation for a TN TF32 16x8x8. I was trying to get the threads to be contiguous when writing N major. I have the following TiledMMA with...

question
? - Needs Triage

If I run a cute layout in TN with DefaultCopy as the s2r atom, I get the exact same results as the ampere tf32 cutlass kernel. I verified that the...

question
? - Needs Triage

Hello I wanted to see if cuBLASDx was a feasible replacement for something I thought I needed to do with cutlass/cute. Here was my workflow: * Oh they have a...

cuBLASdx