hardware-effects
hardware-effects copied to clipboard
4K aliasing correct perf counter
I am curious what is the 'correct' perf counter for 4K aliasing. You have mentioned ld_blocks.store_forward
, but I was wondering about the other counter ld_blocks_partial.address_alias
as well.
Here is the perf list
description:
ld_blocks.store_forward
[loads blocked by overlapping with store buffer that cannot be forwarded]
ld_blocks_partial.address_alias
[False dependencies in MOB due to partial compare on address]
Here are the perf results on my machine:
$ perf stat -e ld_blocks_partial.address_alias,ld_blocks.store_forward ./a.out 4096
222
Performance counter stats for './a.out 4096':
6,852 ld_blocks_partial.address_alias:u
32 ld_blocks.store_forward:u
0.224647447 seconds time elapsed
$ perf stat -e ld_blocks_partial.address_alias,ld_blocks.store_forward ./a.out 4092
359
Performance counter stats for './a.out 4092':
132,139,399 ld_blocks_partial.address_alias:u
2,097,093 ld_blocks.store_forward:u
0.361229917 seconds time elapsed
As you can see, both of them are hugely different for 4092
and 4096
.
Good point, I remember that I was also thinking about this, but it was some time ago :sweat_smile: From
ld_blocks_partial.address and ld_blocks.store_forward, I suspect that maybe store_forward
reports the actual cases where forwarding was blocked, and the second counter reports cases where it was due to "false" aliasing (i.e. when forwarding would be possible, but there was an alias). But obviously these two counters are sampled in different situations, because their values are vastly different.
I would have to remind myself of this in more detail, my knowledge is not so deep in this area :) This is just a (probably wrong) guess.
@travisdowns any hints? :)
Yes, I think ld_blocks_partial.address_alias
counts cases where there was an initial "hit" in the store buffer loose net (i.e., the CPU thinks a load is going to forward from a store), but then when the full address was compared in the fine net, it was found to be a spurious hit due to 4K aliasing. This question and answers have some details about store forwarding and in particular "fine net" and "loose net".
ld_blocks.store_forward
measures some other type of block related to store forwarding, although I'm not actually sure what. Maybe when a load is predicted to forward, but the data is not available, or when a load can't be forwarded because it overlaps but is not fully contained within a store (example).