hydroflow
hydroflow copied to clipboard
perf: avoid clone cost/figure out stream-by-reference mechanism
- [ ]
state()op: #1669 - [ ]
lattice_bimorphism()op/lattice lib #1301 - [ ] in general, in accumulating operators
Somewhat indirectly related to Flo syntax/DAGs (#1500)
Shadaj Laddad :hydro: 6:12 PM
found another instance where it would be very helpful to have borrowed references to bounded streams… In Paxos replicas, we have a piece of logic that takes in unordered sequenced elements, reshuffles them into sequential order, and then sends them for processing into the KV-Store. The reshuffling is done with batching logic, where we take a batch of elements, find the ones that are next in order according to the highest previously seen sequence number, and then emits those. In our old implementation, we then process these into the KV-Store, and then pluck out the “highest previously seen sequence number” from the KV-Store. Ideally, we instead compute this value in the tick by doing processed_payloads.last.map(get sequence number) but because we also emit these payloads this requires a clone, which absolutely tanks the performance of replicas. With borrowed streams, we would be able to do this last + map as zero cost and emit the original values.
Mingwei Samuel 6:13 PM
i.e. get the last element out of processed_payloads without having to stream it? (edited)
Shadaj Laddad :hydro: 6:46 PM
Yes and more importantly without having to clone it (edited)
Like it’s okay if for now we implement .last() as a fold, the performance overhead is coming from the clones, not the iteration over the elements
Slightly unrelated, but the Paxos replica actually doesn't need to pluck out the highest number from the KV-store (there's another stream that already has that number, r_next_slot_after_processing_payloads on line 72).
The real reason we extract the number from the KV-store is because Hydro doesn't compile programs unless all streams end in a HydroLeaf, which is either a DestSink, ForEach, or (in this case) a CycleSink.
Since the replica's KV-store is write-only, it doesn't go to a DestSink, so this is actually kind of a compilation hack.