YSF
YSF
**What is your question?** I try to use the `cutlass::conv::device::Convolution` with the fixed ThreadblockShape, WarpShape and InstructionShape. There is internal error which is too many resources requested actually. It may...
Hello, with some modification such as ElementC, LayoutA, LayoutB, I can run the exmaple https://github.com/NVIDIA/cutlass/blob/main/examples/70_blackwell_gemm/70_blackwell_fp8_gemm.cu successfully. But with the same problem size, the cutlass profiler does not profile the kernel...
**What is your question?** Hi. I am using cutlass to compute fp16 gemm and fp8 per tensor scale gemm on sm1xx. With cutlass old version, it seems A/B/C/D must be...