omniperf icon indicating copy to clipboard operation
omniperf copied to clipboard

Question on TCP_TOTAL_CACHE_ACCESSES_sum and L1cache_data

Open francescosalvadore opened this issue 10 months ago • 1 comments

Describe your question

For a simple memory bound kernel (e.g., matrix linear combination) I would expect to have HBM accesses as L1 (hit+miss) accesses, since most of the accesses are "miss". However, I found that L1cache_data is twice the HBM value. In roofline_calc.py L1 data is computed as:

L1cache_data += df["TCP_TOTAL_CACHE_ACCESSES_sum"][idx] * 64

Is it correct to have 64 as units of accesses? I am using MI250X GCD.

Additional context

No response

francescosalvadore avatar Mar 09 '25 18:03 francescosalvadore

Hi @francescosalvadore. Internal ticket has been created to assist with your question. Thanks!

ppanchad-amd avatar Mar 10 '25 17:03 ppanchad-amd

Hi @francescosalvadore,

TCP_TOTAL_CACHE_ACCESSES_sum is the total number of lines loaded from cache per unit time, so we need to count hits and misses. It's multiplied by 64 because the cache line on MI250X is 64 bytes.

benrichard-amd avatar May 14 '25 15:05 benrichard-amd