uiCA
uiCA copied to clipboard
simulation inaccuracy: missed dep-breaking of pcmpeq
Integer pcmpeq* with source=dest sets destination to all-ones without dependency on source (but still occupies an execution unit). For example, the following loop runs at one cycle per iteration on Skylake, while uiCA predicts two:
loop:
vpcmpeqd xmm0, xmm0, xmm0
vpor xmm0, xmm0, xmm0
dec ecx
jnz loop