omnitrace icon indicating copy to clipboard operation
omnitrace copied to clipboard

Inaccurate device counter trace

Open sfantao opened this issue 2 years ago • 2 comments

Using as an example https://github.com/amd/HPCTrainingExamples/tree/main/HIPIFY/mini-nbody/hip, if I get device counters with rocprof using:

> cat $wd/counters.txt
pmc : WriteSize FetchSize
> bash -c "export ROCR_VISIBLE_DEVICES=0 ; rocprof -i $wd/counters.txt ./nbody-orig $((12*65536))"

I get:

Index,KernelName,gpu-id,queue-id,queue-index,pid,tid,grd,wgr,lds,scr,arch_vgpr,accum_vgpr,sgpr,wave_size,sig,obj,WriteSize,FetchSize
0,"bodyForce(Body*, float, int) [clone .kd]",4,0,0,148495,148495,786432,256,0,0,16,0,16,64,0x0,0x7f4abf7508c0,36723.0000000000,524628.5625000000
1,"bodyForce(Body*, float, int) [clone .kd]",4,0,2,148495,148495,786432,256,0,0,16,0,16,64,0x0,0x7f4abf7508c0,17505.1250000000,488091.6250000000
2,"bodyForce(Body*, float, int) [clone .kd]",4,0,4,148495,148495,786432,256,0,0,16,0,16,64,0x0,0x7f4abf7508c0,17510.6250000000,487910.1250000000
3,"bodyForce(Body*, float, int) [clone .kd]",4,0,6,148495,148495,786432,256,0,0,16,0,16,64,0x0,0x7f4abf7508c0,33072.5000000000,2820859.8125000000
4,"bodyForce(Body*, float, int) [clone .kd]",4,0,8,148495,148495,786432,256,0,0,16,0,16,64,0x0,0x7f4abf7508c0,32875.0000000000,1719172.6875000000
5,"bodyForce(Body*, float, int) [clone .kd]",4,0,10,148495,148495,786432,256,0,0,16,0,16,64,0x0,0x7f4abf7508c0,31081.0000000000,668958.1250000000
6,"bodyForce(Body*, float, int) [clone .kd]",4,0,12,148495,148495,786432,256,0,0,16,0,16,64,0x0,0x7f4abf7508c0,17516.0000000000,488220.2500000000
7,"bodyForce(Body*, float, int) [clone .kd]",4,0,14,148495,148495,786432,256,0,0,16,0,16,64,0x0,0x7f4abf7508c0,32861.8750000000,3522902.0625000000
8,"bodyForce(Body*, float, int) [clone .kd]",4,0,16,148495,148495,786432,256,0,0,16,0,16,64,0x0,0x7f4abf7508c0,17505.0000000000,488151.7500000000
9,"bodyForce(Body*, float, int) [clone .kd]",4,0,18,148495,148495,786432,256,0,0,16,0,16,64,0x0,0x7f4abf7508c0,32938.8750000000,2949121.8750000000

If I use omniperf with a configuration containing:

OMNITRACE_ROCM_EVENTS                              = FetchSize:device=0 WriteSize:device=0

and run:

bash -c "export ROCR_VISIBLE_DEVICES=0 ; omnitrace-sample ./nbody-orig $((12*65536))"

I get: image i.e the counters do not show any fluctuation as they should trusting the rocprof output.

Tested on ROCm 5.7.0 and omnitrace omnitrace-1.10.4-ubuntu-20.04-ROCm-50700-PAPI-OMPT-Python3.sh.

For completeness on different machine and ROCm 5.6.1 I see things like:

image

Also no fluctuations but for the first kernel the reading starts correct but shifts in the middle of the kernel.

sfantao avatar Nov 29 '23 10:11 sfantao

There are a couple things going on here. I believe the default view of the timelines is the accumulation of the counters, so you will not see them fluctuate but instead, grow over time — if you click on the lightning bolt looking thing, you can change the view, I think one of them will be the delta. Second, there are likely some discrepancies from mapping hardware counters for kernels onto the kernel-independent timeline. Third, I don’t have a ton of confidence in the combination of the timing alignment between omnitrace’s current use of roctracer for kernel timing with the kernel timings reported by rocprofiler when it reports the HW counters — this needs to be investigated.

jrmadsen avatar Dec 13 '23 10:12 jrmadsen

Hi @sfantao, are you still experiencing this issue?

schung-amd avatar Oct 08 '24 13:10 schung-amd

Closing for now, this issue is fairly old and things have likely changed in the meanwhile. If you're still experiencing this issue, feel free to comment here and we can reopen this, or you can submit a new issue if you prefer.

schung-amd avatar Nov 07 '24 16:11 schung-amd