te42kyfo comments

Results 15 comments of


                                            te42kyfo

TCC_HIT_sum for MI210 is only half of MI100 when L2CacheHit is ~100%

I observed the same, on MI100, (TCC_HIT_sum + TCC_MISS_sum) * 32 matched the expected L2 cache data volume. On MI210, this expressions results in exactly half of what is expected....

Problem about bandwidht test

At 1024*1024 elements the total data volume per array is 1024*1024 * sizeof(double) = 8MB. The 4080Ti has 32MB of L2 cache, so even the triad test, that uses 3...

Problem about bandwidht test

As far as I could google, the A800 is made from two GA100 chips, each of which has 40MB of cache. You should also be able to query that as...

Problem about bandwidht test

You are right, my info was faulty. The A800 is just one model based on the GA100 chip, which has 40MB L2. I just googled really quickly because I haven't...

First of all, if you want to see more data, you can uncomment the lines 81-91: ``` measureDRAMBytesStart(); callKernel(blockCount, blockRun); auto metrics = measureDRAMBytesStop(); dram_read.add(metrics[0]); dram_write.add(metrics[1]); measureL2BytesStart(); callKernel(blockCount, blockRun); metrics...

Problem about l2 cache test

That's a fun one! `localSum += B[idx]:` results in assembly like this ([godbolt](https://godbolt.org/z/P8qKE4bsa), N=4 to make it shorter): ```LDG.E.64.CONSTANT R6, [R4.64] LDG.E.64.CONSTANT R10, [R8.64] LDG.E.64.CONSTANT R14, [R12.64] LDG.E.64.CONSTANT R16, [R16.64]...

te42kyfo

TCC_HIT_sum for MI210 is only half of MI100 when L2CacheHit is ~100%

Problem about bandwidht test

Problem about bandwidht test

Problem about bandwidht test

Problem about l2 cache test

Problem about l2 cache test

can not run on gfx1100 --> rx7900

can not run on gfx1100 --> rx7900

can not run on gfx1100 --> rx7900

What's the difference between cuda-l2-cache and gpu-cache benchmarks?