mammoth831

Results 3 issues of mammoth831

Hi, @thakkarV https://github.com/NVIDIA/cutlass/blob/47a3ebbea9860e14c095b52c4e6e2db33340f572/include/cutlass/epilogue/collective/sm70_epilogue_vectorized.hpp#L237 Strangely, it requires TiledCopyS2R's threads equal to the MMA AtomC's threads. I think here we describe how each thread does LDS and therefore it should be: ```c++...

question
? - Needs Triage
inactive-30d
inactive-90d

**Describe the bug** I find that `` must be included before ``, otherwise we cannot compile. **Steps/Code to reproduce bug** Compile with `nvcc test.cu -I include/ -std=c++17` ```c++ #include //...

bug

**What is your question?** Suppose we are calculating a 4x4 tensor and we only have a 2x4 smem resource. When the results are computed by different warp groups, e.g. the...

question
? - Needs Triage
inactive-30d
inactive-90d