Jim Wu
Jim Wu
Thanks for the response, that makes sense. I've rerun the benchmarks on A100 and the results seem to be closer, however there are still some areas where the sparse kernels...
I have the result from the profiler in a csv in this [gist](https://gist.github.com/jimwu6/d2b766f12fed3831894c4d991b12f84f) for the best one for M=768, N=4096, K=4096.
The tile size is referring to cta_{m, n, k}, correct? If so, my first thought is that the cutlass_profiler should be running all possible configurations it has given the settings...
Using that flag produces no new kernels, which is verified by no additional results in the cutlass_profiler.
I have the same contents as you. However the profiler only seem to run the 64x128, even when I don't specify cta. I am running exactly ``` ../tools/profiler/cutlass_profiler --operation=spgemm --m=768...
When I run that the only sparse GEMMs that don't have `--cta_m=64 --cta_n=128` are those which are f32 or s4 or s8 in its inputs.