Question on TCP_TOTAL_CACHE_ACCESSES_sum and L1cache_data
Describe your question
For a simple memory bound kernel (e.g., matrix linear combination) I would expect to have HBM accesses as L1 (hit+miss) accesses, since most of the accesses are "miss". However, I found that L1cache_data is twice the HBM value. In roofline_calc.py L1 data is computed as:
L1cache_data += df["TCP_TOTAL_CACHE_ACCESSES_sum"][idx] * 64
Is it correct to have 64 as units of accesses? I am using MI250X GCD.
Additional context
No response
Hi @francescosalvadore. Internal ticket has been created to assist with your question. Thanks!
Hi @francescosalvadore,
TCP_TOTAL_CACHE_ACCESSES_sum is the total number of lines loaded from cache per unit time, so we need to count hits and misses. It's multiplied by 64 because the cache line on MI250X is 64 bytes.