Michael-Scott queue : safe - unsafe versions
This PR follows PR #122 work: the optimizations or changes made in this previous PR are separated into two to get a safe Michael-Scott queue and an "unsafe" one (with Obj.magic use).
TODO :
- [x] For now, the
dschecktests for the unsafe version get segmentation faults. - [x] Documentation
@polytypic: The dscheck tests of the unsafe version result in a segmentation fault. I encountered the same issue with PR #122 after rebasing on main and adding the few necessary changes. Could you have a look?
@polytypic : this PR is ready for review. Could you have a look ?
Looking at the "safe" version, the push logic:
(* ... *)
let push { tail; _ } value =
let rec find_tail_and_enq curr_end node =
if not (Atomic.compare_and_set curr_end Nil node) then
match Atomic.get curr_end with
| Nil -> find_tail_and_enq curr_end node
| Next (_, n) -> find_tail_and_enq n node
in
let new_tail = Atomic.make Nil in
let newnode = Next (value, new_tail) in
let old_tail = Atomic.get tail in
find_tail_and_enq old_tail newnode;
if not (Atomic.compare_and_set tail old_tail new_tail) then
fix_tail tail new_tail
doesn't have some of the (entirely safe) optimizations from the "unsafe" version:
(* ... *)
let push { tail; _ } value =
let (Next _ as new_node : (_, [ `Next ]) Node.t) = Node.make value in
let old_tail = Atomic.get tail in
let link = Node.as_atomic old_tail in
if Atomic.compare_and_set link Nil new_node then
Atomic.compare_and_set tail old_tail new_node |> ignore
else
let backoff = Backoff.once Backoff.default in
push tail link new_node backoff
In particular, notice how the "unsafe" version does not call any recursive functions in the fast path:
let push { tail; _ } value =
let (Next _ as new_node : (_, [ `Next ]) Node.t) = Node.make value in
let old_tail = Atomic.get tail in
let link = Node.as_atomic old_tail in
(* first we try the fast path *)
if Atomic.compare_and_set link Nil new_node then
Atomic.compare_and_set tail old_tail new_node |> ignore
else
(* failed, so we do the slow path *)
It should be possible to incorporate this optimization into the "safe" version and it might speed it up a tiny bit.
About the missing safe optimizations: I added the fast path in the push function. However, the benchmarks suggest that adding backoff (either for in fix_tail or find_tail_and_end) does not improve performance (at least on my computer). Did I miss other safe optimizations ?
About the missing safe optimizations: I added the fast path in the push function. However, the benchmarks suggest that adding backoff (either for in
fix_tailorfind_tail_and_end) does not improve performance (at least on my computer). Did I miss other safe optimizations ?
Hmm... One could also leave the backoff out. It is possible that the backoff helps on other CPU (micro)architectures and we can add that later.
Hi there! While I understand and appreciate the hard work that went into optimizing the queue, what is the recommended alternative to grab the elements of the queue not that snapshot are gone? A pop loop followed by a push loop?
As a side note, the removal of the snapshot mechanism wasn't just about optimization.
Here is a puzzle about the snapshot mechanism. The code below uses the previous 0.4.1 version of this library, which is the only published version of this library that has the MS queue with the snapshot mechanism and no (additional) space leaks. Let's first define a module alias:
module Q = Saturn.Queue
Here is a function that converts a "snapshot" into a list:
let[@tail_mod_cons] rec to_list s =
match Q.next s with
| None -> []
| Some (v, s) -> v :: to_list s
Question A: What is the return value of the below program?
let q = Q.create () in
let s = Q.snapshot q in
Q.push q 101;
ignore (Q.pop q);
to_list s
Question B: What is the return value of the below program?
let q = Q.create () in
Q.push q 101;
let s = Q.snapshot q in
ignore (Q.pop q);
Q.push q 42;
ignore (Q.pop q);
to_list s
Click to see answer and explanation
The first program returns [] and the second program returns [101; 42]. Note that there is no point in time when the queue in the second program contains both 101 and 42.
To me, a "snapshot" should be an immutable instant. But that is not how the mechanism worked. Instead, snapshot simply returned a pointer to the first non-dummy (if any) node of the queue and the next operation then dereferenced the (mutable) pointer to the next node.
This meant that if the queue happened to be empty, the snapshot would be empty. If the queue happened to be non-empty, the snapshot would contain all elements pushed to the queue at that point and potentially all the element to be pushed to the queue in the future.
This behaviour of the snapshot mechanism is, of course, not limited to simple sequential programs. Concurrent pushes to the queue are also potentially visible through the "snapshot".
Good to know! Given the nature of the concurent queue, I was never expecting a strict control over the interleaving calls but this is clearly opposite to what it was advertising.
I think it'd be nice to have a way to get through the elements in the queue without removing them, basically a deeper peek API.
I might see if I can propose a PR for that.
Note that providing a snapshot operation can be easier for some other lock-free queue approaches.
The "lazy" queue, for example, makes it particularly easy.
The approach using two (essentially immutable) stacks also makes it relatively easy to take a snapshot of the queue. The Picos project has an implementation of a two-stack multi-producer, multi-consumer queue that should be relatively easy to extend with a snapshot operation.
Also, the queue provided with Kcas has snapshot functionality.