Change Default Normal Unit to per_kernel
As an application developer I think in terms of kernels rather than wave/wavefronts. Per kernel is also the normalization used in nsight compute so using per kernel makes it easier to compare output with ncu which is one of my common use cases.
Thanks for the suggestion. The mode "Per kernel comparison to ncu" is not "only" your case. We did put it in our plan.
Hi @MrBurmark, sorry for the late follow-up. @feizheng10 has put in a change (https://github.com/ROCm/rocprofiler-compute/pull/555) to make the default normalization to be per kernel. Let me know if you have any other concerns.
Closing this issue as it is resolved. @MrBurmark Feel free to re-open the issue if you have any follow-up questions/concerns.