valgrind icon indicating copy to clipboard operation
valgrind copied to clipboard

Should sequentially consistent atomic (LOCK prefix) have 'fence' semantics?

Open LouisJenkinsCS opened this issue 5 years ago • 3 comments

Hello Intel developers,

I was wondering whether or not a LOCK CMPXCHG should actually count as a SFENCE given that the Intel maual in section 8.2 states the following...

The I/O instructions, locking instructions, the LOCK prefix, and serializing instructions force stronger ordering on the processor.

and...

Synchronization mechanisms in multiple-processor systems may depend upon a strong memory-ordering model. Here, a program can use a locking instruction such as the XCHG instruction or the LOCK prefix to ensure that a read-modify-write operation on memory is carried out atomically. Locking operations typically operate like I/O operations in that they wait for all previous instructions to complete and for all buffered writes to drain to memory (see Section 8.1.2, “Bus Locking”).

Hence I was wondering how pmemcheck handles cases like...

CLFLUSHOPT(&x);
atomic_compare_exchange_strong(&y, &z, w);

Right now it seems that CAS instructions are not instrumented to do_fence ...

https://github.com/pmem/valgrind/blob/584b19981f6ffcba4cc354e9adcc525fbea4ae51/pmemcheck/pmc_main.c#L1683-L1749

Should a CAS have a do_fence prior to it, since Valgrind transforms all LOCK prefixed instructions as CAS?

VEX/priv/guest_amd64_toIR.c [Cannot provide permalink due to file being too large]

/* LOCK prefixed instructions.  These are translated using IR-level
   CAS statements (IRCAS) and are believed to preserve atomicity, even
   from the point of view of some other process racing against a
   simulated one (presumably they communicate via a shared memory
   segment).

   Handlers which are aware of LOCK prefixes are:
      dis_op2_G_E      (add, or, adc, sbb, and, sub, xor)
      dis_cmpxchg_G_E  (cmpxchg)
      dis_Grp1         (add, or, adc, sbb, and, sub, xor)
      dis_Grp3         (not, neg)
      dis_Grp4         (inc, dec)
      dis_Grp5         (inc, dec)
      dis_Grp8_Imm     (bts, btc, btr)
      dis_bt_G_E       (bts, btc, btr)
      dis_xadd_G_E     (xadd)
*/
   /* There are 3 cases to consider:

      reg-reg: ignore any lock prefix, generate sequence based
               on ITE

      reg-mem, not locked: ignore any lock prefix, generate sequence
                           based on ITE

      reg-mem, locked: use IRCAS
   */

LouisJenkinsCS avatar Apr 21 '20 15:04 LouisJenkinsCS

I think you're right. The support for LOCK-prefixed instructions was on the list of future improvements since 2016. At least, I can see this bullet on my slides from LinuxCon 2016. ;-) Thanks for filing this issue.

krzycz avatar Apr 22 '20 11:04 krzycz

Also, I think we should distinguish between LFENCE and SFENCE/MFENCE. Currently, they are all translated into Imbe_Fence and trigger do_fence. IMHO, only SFENCE/MFENCE should result in Flushed->Fenced transition.

krzycz avatar Apr 23 '20 09:04 krzycz

Ooops! Looks like I was looking at some old revision. The LFENCE/SFENCE support has been already fixed.

krzycz avatar Apr 23 '20 10:04 krzycz

As of the moment no new features are planned. Closing this issue for now. For further reading, please take a look at our blog post with more details.

wlemkows avatar Mar 13 '23 10:03 wlemkows