saturn
saturn copied to clipboard
Optimize Michael-Scott queue and remove `snapshot` mechanism
This PR optimizes the Michael-Scott queue implementation. The main optimizations are avoiding false sharing and then optimization of the node representation, which effectively halves the length of the linked list of nodes by "inlining" the atomic into the node record and improves performance dramatically.
This PR also removes the snapshot mechanism. The snapshot mechanism is not used in any tests in this repository. Furthermore, the snapshot mechanism cannot be made to work (efficiently) as such with the traditional Michael-Scott queue representation where the head points to a node whose value has been taken from the queue. That is because, to avoid space leaks, the value in the head node must be overwritten and that simply breaks the snapshot mechanism. The previous implementation avoided this problem by using an inefficient node representation that effectively made the linked list of nodes twice as long and operations on the queue twice as slow.
Here is a run of the benchmarks on my M3 Max before the optimizations:
➜ saturn git:(rewrite-bench) ✗ dune exec --release -- ./bench/main.exe -budget 1 'free Queue' | jq '[.results.[].metrics.[] | select(.name | test("over")) | {name, value}]'
[
{
"name": "messages over time/one domain",
"value": 25.832570841950652
},
{
"name": "messages over time/1 nb adder, 1 nb taker",
"value": 30.775546858386015
},
{
"name": "messages over time/1 nb adder, 2 nb takers",
"value": 28.61298006857959
},
{
"name": "messages over time/2 nb adders, 1 nb taker",
"value": 26.324158773037244
},
{
"name": "messages over time/2 nb adders, 2 nb takers",
"value": 26.72249173339718
}
]
And here after the optimizations:
➜ saturn git:(optimize-michael-scott-queue) ✗ dune exec --release -- ./bench/main.exe -budget 1 'free Queue' | jq '[.results.[].metrics.[] | select(.name | test("over")) | {name, value}]'
[
{
"name": "messages over time/one domain",
"value": 49.09980418998089
},
{
"name": "messages over time/1 nb adder, 1 nb taker",
"value": 98.92424842899835
},
{
"name": "messages over time/1 nb adder, 2 nb takers",
"value": 73.48287261205158
},
{
"name": "messages over time/2 nb adders, 1 nb taker",
"value": 79.05834136653924
},
{
"name": "messages over time/2 nb adders, 2 nb takers",
"value": 70.05359476527885
}
]
Here is a run of the benchmarks without the inlined node representation:
➜ saturn git:(optimize-michael-scott-queue) ✗ dune exec --release -- ./bench/main.exe -budget 1 'free Queue' | jq '[.results.[].metrics.[] | select(.name | test("over")) | {name, value}]'
[
{
"name": "messages over time/one domain",
"value": 27.799662289702503
},
{
"name": "messages over time/1 nb adder, 1 nb taker",
"value": 46.095344138917085
},
{
"name": "messages over time/1 nb adder, 2 nb takers",
"value": 42.19755884308922
},
{
"name": "messages over time/2 nb adders, 1 nb taker",
"value": 63.98353506270374
},
{
"name": "messages over time/2 nb adders, 2 nb takers",
"value": 58.21147004626938
}
]
As you can see, on the one domain and 1 nb adder, 1 nb taker configurations the inlined node representation is roughly twice as fast! On the more contentious configurations, the additional cores likely both help to overcome some of the additional overhead (when not using the inlined representation) and the additional contention brings down the performance of inlined representation so the performance difference is not quite as dramatic.
You can run cp test/michael_scott_queue/michael_scott_queue_node.ml src_lockfree or git reset --hard HEAD~1 to use the non-inlined node representation:
modified src_lockfree/michael_scott_queue_node.ml
@@ -3,12 +3,10 @@ module Atomic = Transparent_atomic
type ('a, _) t =
| Nil : ('a, [> `Nil ]) t
| Next : {
- mutable next : ('a, [ `Nil | `Next ]) t;
+ next : ('a, [ `Nil | `Next ]) t Atomic.t;
mutable value : 'a;
}
-> ('a, [> `Next ]) t
-let[@inline] make value = Next { next = Nil; value }
-
-external as_atomic : ('a, [ `Next ]) t -> ('a, [ `Nil | `Next ]) t Atomic.t
- = "%identity"
+let[@inline] make value = Next { next = Atomic.make Nil; value }
+let[@inline] as_atomic (Next r : ('a, [ `Next ]) t) = r.next