mgcpp
mgcpp copied to clipboard
Make an efficient CUDA microbenchmark framework
Make a efficient CUDA micro benchmark framework
The current workflow of writing/optimizing CUDA kernels is very difficult because there is no proper, consistent way of measuring the performance of kernels. A simple and consistent tool to measure and profile CUDA kernels is required.
Requirements
- Automatic measuring of FLOPS (probably using nvprof)
- Measuring of parallel scaling
- Simple, nutshell API
- Plotting the benchmark reports (probably using pyplot, gnuplot)
working on this on a separate repository https://github.com/MGfoundation/mgbench