Should sequentially consistent atomic (LOCK prefix) have 'fence' semantics?
Hello Intel developers,
I was wondering whether or not a LOCK CMPXCHG should actually count as a SFENCE given that the Intel maual in section 8.2 states the following...
The I/O instructions, locking instructions, the LOCK prefix, and serializing instructions force stronger ordering on the processor.
and...
Synchronization mechanisms in multiple-processor systems may depend upon a strong memory-ordering model. Here, a program can use a locking instruction such as the XCHG instruction or the LOCK prefix to ensure that a read-modify-write operation on memory is carried out atomically. Locking operations typically operate like I/O operations in that they wait for all previous instructions to complete and for all buffered writes to drain to memory (see Section 8.1.2, “Bus Locking”).
Hence I was wondering how pmemcheck handles cases like...
CLFLUSHOPT(&x);
atomic_compare_exchange_strong(&y, &z, w);
Right now it seems that CAS instructions are not instrumented to do_fence ...
https://github.com/pmem/valgrind/blob/584b19981f6ffcba4cc354e9adcc525fbea4ae51/pmemcheck/pmc_main.c#L1683-L1749
Should a CAS have a do_fence prior to it, since Valgrind transforms all LOCK prefixed instructions as CAS?
VEX/priv/guest_amd64_toIR.c [Cannot provide permalink due to file being too large]
/* LOCK prefixed instructions. These are translated using IR-level
CAS statements (IRCAS) and are believed to preserve atomicity, even
from the point of view of some other process racing against a
simulated one (presumably they communicate via a shared memory
segment).
Handlers which are aware of LOCK prefixes are:
dis_op2_G_E (add, or, adc, sbb, and, sub, xor)
dis_cmpxchg_G_E (cmpxchg)
dis_Grp1 (add, or, adc, sbb, and, sub, xor)
dis_Grp3 (not, neg)
dis_Grp4 (inc, dec)
dis_Grp5 (inc, dec)
dis_Grp8_Imm (bts, btc, btr)
dis_bt_G_E (bts, btc, btr)
dis_xadd_G_E (xadd)
*/
/* There are 3 cases to consider:
reg-reg: ignore any lock prefix, generate sequence based
on ITE
reg-mem, not locked: ignore any lock prefix, generate sequence
based on ITE
reg-mem, locked: use IRCAS
*/
I think you're right. The support for LOCK-prefixed instructions was on the list of future improvements since 2016. At least, I can see this bullet on my slides from LinuxCon 2016. ;-) Thanks for filing this issue.
Also, I think we should distinguish between LFENCE and SFENCE/MFENCE. Currently, they are all translated into Imbe_Fence and trigger do_fence. IMHO, only SFENCE/MFENCE should result in Flushed->Fenced transition.
Ooops! Looks like I was looking at some old revision. The LFENCE/SFENCE support has been already fixed.
As of the moment no new features are planned. Closing this issue for now. For further reading, please take a look at our blog post with more details.