TUMSchieben comments

Results 7 comments of


                                            TUMSchieben

[BUG] batch GEMM execution via cutlass_profiler gives weird outputs

I just followed the README guide: cmake .. -DCUTLASS_NVCC_ARCHS=80 make cutlass_profiler -j16

[BUG] batch GEMM execution via cutlass_profiler gives weird outputs

> What type of batch are you interested in? Data types, layouts, architectures, etc. I'd like to know the performance of batch matmul ops used in typical transformer models, and...

[BUG] batch GEMM execution via cutlass_profiler gives weird outputs

> 2D data. Sorry for the error, I mean normal layout is OK. It's not limited for row major or column major. I'm just curious about the peak performance that...

[BUG] batch GEMM execution via cutlass_profiler gives weird outputs

> BTW, if you want to run batch gemm for the transformer model. Group GEMM may be more useful to you. Check https://github.com/NVIDIA/cutlass/tree/master/examples/24_gemm_grouped . Group GEMM is not runnable in...

[BUG] batch GEMM execution via cutlass_profiler gives weird outputs

> 36 is not multiple of 8. The kernel instantiated in the example needs M to be multiple of 8. You can change the alignment to run 36. Or you...

[BUG] batch GEMM execution via cutlass_profiler gives weird outputs

> The problems are resolved after updating the repo to the latest commit.