TUMSchieben
TUMSchieben
I just followed the README guide: cmake .. -DCUTLASS_NVCC_ARCHS=80 make cutlass_profiler -j16
> What type of batch are you interested in? Data types, layouts, architectures, etc. I'd like to know the performance of batch matmul ops used in typical transformer models, and...
> 2D data. Sorry for the error, I mean normal layout is OK. It's not limited for row major or column major. I'm just curious about the peak performance that...
> BTW, if you want to run batch gemm for the transformer model. Group GEMM may be more useful to you. Check https://github.com/NVIDIA/cutlass/tree/master/examples/24_gemm_grouped . Group GEMM is not runnable in...
> BTW, if you want to run batch gemm for the transformer model. Group GEMM may be more useful to you. Check https://github.com/NVIDIA/cutlass/tree/master/examples/24_gemm_grouped . Group GEMM is not runnable in...
> 36 is not multiple of 8. The kernel instantiated in the example needs M to be multiple of 8. You can change the alignment to run 36. Or you...
> The problems are resolved after updating the repo to the latest commit.