rfcs Prove the correctness of a work-stealing deque

Rendered Implementation

This is a linearization proof of the Chase-Lev work-stealing deque. To the best of my knowledge, it is the first publicly available linearization proof (attempt) of the Chase-Lev deque :)

Jan 08 '18 22:01 jeehoonkang

Hey, this looks amazing! :) I haven't read the proof in detail yet, but here go a few quick comments/questions:

Are you going to eventually turn this RFC into a research paper?
Is anyone outside the Crossbeam project aware of this proof? If so, have they read/verified it?
Do you have any benchmark numbers comparing the current and the new deque implementation?
Have you tried verifying the new implementation using CDS Checker? I believe it'd be easy to modify this C++ code to use the new memory orderings and run the checker on it.

Jan 08 '18 23:01 ghost

I'd like to write a paper out of https://github.com/jeehoonkang/crossbeam-rfcs/blob/deque-proof/text/2017-07-23-relaxed-memory.md and this. Some of my colleagues in academia (including my supervisor) are also reading this proof.

I don't have any benchmark numbers yet.. I'll prepare soon before merging https://github.com/crossbeam-rs/crossbeam-deque/pull/2.

I couldn't properly run CDSChecker.. I followed the instruction in https://github.com/computersforpeace/model-checker, but it always say OUT OF BOOTSTRAP MEMORY when running ./run.sh. Do you have any idea?

Jan 09 '18 00:01 jeehoonkang

I just fixed a few bugs, but I don't think all the bugs are squashed right now. I'll reread the proof 2-3 more times.

Jan 09 '18 20:01 jeehoonkang

I think the RFC itself is now ready to be merged. I checked the proof several times, and updated it. @stjepang please have a look!

Though the implementation needs some more work:

Benchmark: Currently crossbeam-deque doesn't have any benchmark. It's better to make one, and see how this RFC improves (or degrades!) the performance.
Model checking: it's better to run CDSChecker or other model checkers against the proposed implementation.

Jan 13 '18 07:01 jeehoonkang

@stjepang Thanks! I revised the document as you mentioned.

I'm confident with this proof. But since it will be deployed in production (Firefox), it's better to peer-review this proof, at least. Before merging it, I'd like to wait for my supervisor @gilhur to finish reading this proof.
This new orderings seem like a pure win over the current implementation, especially so on weakly-ordered architectures.

I thought the same thing, but it turns out that the original and the new version makes almost identical x86 and ARM binary. In particular, I thought that a CAS w/ release/acquire orderings for success/failure cases is more efficient than CAS w/ seqcst/relaxed, but they are actually compiled to the same instruction in x86 and ARMv8. (Also in ARMv7 for GCC, while theoretically the former can be more efficiently compiled).

As you suggested in IRC, I'll benchmark this branch w/ rayon.breadth_first(). I don't expect a huge win in even weaker architectures, though.

Still I believe it is good to merge this branch, because ~~it'll give me a better chance for paper acceptance~~ it more clearly reveals the synchronizations conducted in this data structure. In particular, in my opinion, it's hard to understand the meaning of SeqCst load/store/rmw. What do you think?
If you agree, I'd like to delete is_pinned and replace it with pin_fence.

I think we can do it without an RFC, since it doesn't change the existing API. But maybe pin_fence is not the best name for it. Here I'll trust your judgment :)

Jan 18 '18 14:01 jeehoonkang

@jeehoonkang Any news on this issue? How's the review of this proof going?

Jun 16 '18 15:06 ghost

Sorry for inactivity these days. My colleage and I are working on a different paper, so revising this proof had been postponed probably until mid July. I guess the merge of this PR is not urgent, right? I'll definitely come back and merge this PR :)

Jun 17 '18 00:06 jeehoonkang

Just checking, take your time. :)

Jun 17 '18 02:06 ghost