failure icon indicating copy to clipboard operation
failure copied to clipboard

Does the atomic load in backtrace need to be SeqCst?

Open withoutboats opened this issue 7 years ago • 4 comments

I don't believe its necessary for correctness in any meaningful sense, we could use Relaxed or AcqRel.

I believe that reducing it to AcqRel would be a strict performance improvement; I believe AcqRel is enough to make sure that is_backtrace_enabled only runs once. However, as with atomics always, I am not confident that I am right.

Lowering it further to Relaxed might be worth it; occassionally running is_backtrace_enabled more than once could be worth having cheaper loads over the whole course of program execusion.

cc @alexcrichton who wrote the original code

withoutboats avatar Jan 28 '18 01:01 withoutboats

AFAIK the SeqCst bit isn't required, I just tend to be very wary of non-SeqCst orderings in that it's quite rare that they come up in profiles and it's also quite rare that they're correct if they're not SeqCst (in my experience at least)

alexcrichton avatar Jan 28 '18 03:01 alexcrichton

Lowering it further to Relaxed might be worth it; occassionally running is_backtrace_enabled more than once could be worth having cheaper loads over the whole course of program execusion.

+1 on this. is_backtrace_enabled doesn't seem to be thaaat expensive, and computing it more than once always should return the same value. If it doesn't it is because the program changed the value of the environment variable during execution, and this should only be done in a synchronized way anyways.

gnzlbg avatar Jan 28 '18 09:01 gnzlbg

I believe that as currently written, your code can already run is_backtrace_enabled more than once. This is unrelated to acquire/release ordering and remains true even under sequentially consistent execution:

  • Thread 1 comes in, loads the value of ENABLED (0), and blocks.
  • Thread 2 comes in, loads the value of ENABLED (still 0), and does the remainder of the initialization.
  • Thread 1 wakes up and repeats the initialization work that was done by thread 2.

If you really wanted to make sure that initialization is never carried out more than once, you would need to mutate the value of ENABLED with a read-modify-write instruction (e.g. enabled.swap(SOME_BUSY_STATE_MARKER)) in order to tell the threads coming after you that you are already doing the work and they should wait. But then you would be getting dangerously close to what std::sync::Once does, and it would be harder to justify rolling your own version.

As long as duplicate initialization is safe, cheap, and infrequent, I would personally not bother, and just make the atomic operations Relaxed. Stricter orderings like Acquire and Release are for cases where you access multiple memory locations and need to synchronize these accesses with each other, which is not the case here. When you only access a single shared memory location, atomicity and cache coherence are the only properties that you need, and for that Relaxed is the correct choice.

HadrienG2 avatar Feb 24 '18 16:02 HadrienG2

@alexcrichton As an aside, I would respectfully disagree with your decision of using SeqCst as a default. Personally, when I meet SeqCst ordering in someone's code, what I get out of it is that at least one of the two following properties must hold true:

  • The synchronization protocol is complicated and somehow depends on precise execution order, rather than leveraging simpler message-passing semantics (which only require Acquire/Release).
  • The person who implemented the synchronization protocol did not have a firm understanding of it and used SeqCst as a magical incantation to be on the safe side.

I am wary of complicated interleaving-sensitive synchronization protocols, because they are both less efficient on the machine side and harder to reason about on the human side (consider that for N concurrent instructions, there are N! possible interleavings). And for obvious reasons, I am even more wary of people who play with atomics without a firm understanding of what they are doing. So already, when I see this ordering, I know that I will have a hard time trusting the code.

In constrast, I think that a simpler synchronization protocol based on Relaxed, Acquire and Release inspires more confidence, because it is easier to see what the author intended (as the desired synchronization is documented in the precise choice of memory orderings), and to check whether the resulting synchronization protocol is correct or not.

HadrienG2 avatar Feb 24 '18 16:02 HadrienG2