omniperf icon indicating copy to clipboard operation
omniperf copied to clipboard

Filtering by block doesn't consider cross-block dependencies for metrics

Open skyreflectedinmirrors opened this issue 3 years ago • 4 comments

Specifically, we noticed this while trying to collect coalescing (which lives in the TCP section):

https://github.com/AMDResearch/omniperf/blob/62d130b458a21a2c964da234cf7a24420e01efe1/src/omniperf_cli/configs/gfx90a/1600_L1_cache.yaml#L20

but uses values from the TA (i.e., TA_TOTAL_WAVEFRONTS_sum).

So, if a user does:

omniperf profile -b TCP -n bar -- <foo>
omniperf analyze -p workloads/bar/mi200

the resulting Buffer Coalescing value in the L1 section will be empty.

skyreflectedinmirrors avatar Nov 10 '22 20:11 skyreflectedinmirrors

Ah, good catch. Thanks for reporting this.

We'll have to refine the logic for ip block filtering to account for metrics that reference other blocks such as this. We'll add this to the next release

coleramos425 avatar Nov 11 '22 21:11 coleramos425

Adding this to a future milestone. IP Block, dispatch, and kernel filtering are going to be overhauled when we introduce alternative profiling to users.

This alternative profiling option will introduce a single output csv where organizing logical IP Blocks is much easier. This will also eliminate the issue we have with metrics that use counters from different blocks like our Memory Chart. This similar issue is described below

Issue was that these metrics used SQ_ACCUM_PREV_HIRES which is a counter generated in several ip blocks. Needed to specify which csv to pull counter from in .yaml configs.

Another issue exists with these two cache latencies

image

The expressions for these metrics use counters from two ip blocks.

i.e. L1D Cache Latency = AVG(SQ_ACCUM_PREV_HIRES[from SQ_IFETCH_LEVEL] / SQC_DCACHE_REQ [from pmc_perf])

The coll_level fix used above won't work for these as two different csv would need to be specified. coll_level only lets us specify one. Could either

  • Reorder performance counters in rocprof perfmon config
  • Modify cli tool's coll_level implementation

coleramos425 avatar Dec 12 '22 21:12 coleramos425

Closing since ticket is no longer relevant. Thanks!

ppanchad-amd avatar Oct 04 '24 19:10 ppanchad-amd

This is definitely still relevant

skyreflectedinmirrors avatar Oct 04 '24 19:10 skyreflectedinmirrors