hydroflow
hydroflow copied to clipboard
design: `poll_futures` is non-ergonomic in that it drives ticks from the middle of a stream
a linear pipeline starting with source_iter should probably only run on the first tick. However if it has a poll_futures in it, it may run on subsequent ticks.
This suggests that poll_futures is probably better modeled as a sink (0-output) and source (0-input) operator combination, rather than an inline operator.
We should decide and resolve.
Note that in the new tick formalization where there are no global ticks, this changes even further. Things like futures become extra weird because operators are not allowed to spontaneously generate outputs when not provided new inputs (they are deterministic). So separating into a source/sink with a key to pair them together will help.
To add on, I also think that we can absolutely sacrifice ergonomics at the Hydroflow IR level. For example with futures, we can do the source/sink split while preserving a nice inline API at the Hydroflow+ level. With the new scope ticks, something like:
hydroflow_plus::loop(|cur_tick| { // creates a scoped tick context
let futures_stream = // ...
let future_results = futures_stream.poll_futures();
// we can't directly use `future_results` because its values are generated at the tick scope outside of this, so
future_results.batched().for_each(...)
})
This is nice because it also makes it more explicit that the results of the futures are re-batched / re-ordered.
I'm not sure I understand enough about the new tick system to understand the last comment, but otherwise I totally agree.
I think .poll_futures makes sense as a Hydroflow+ operator in the same way that .send_to does (implied sink -> source).
When pair programming with Mingwei, we briefly discussed implementing it this way, but he suggested that there wasn't a simple/obvious way to structure this implementation in the existing Hydroflow source, as we basically want an operator split across two subgraphs and it wasn't immediately obvious how to pass future data between them. Thus, we went with the current approach to get an mvp up and running quickly.
Decision is to remove poll_futures from Hydroflow, and defer to Hydroflow+.
Hmm I am confused. This can't be implemented in purely Hydroflow+, we definitely need a low level operator because it is closely tied to the scheduler.
Seems like a design/planning meeting is in order on this topic. Let's agree on the design ASAP -- if we can't get a decision/implementation this week for the release let's remove and reinstitute in the next release.
Could be implemented with an external channel (tokio channel) though maybe against the spirit of keeping things in hydroflow
Also interesting - if we split into a sink_futures and source_stream and the two operators are in the same stratum, then it may be weird if a future gets sent out and received in the same tick/stratum run
I'd assume the future handling is idempotent, so i don't think the length of the delay (including 0!) should matter? Worst case we send the future twice and it gets filtered out by idempotence enforcement?
On Mon, May 20, 2024 at 1:08 PM Mingwei Samuel @.***> wrote:
Also interesting - if we split into a sink_futures and source_stream and the two operators are in the same stratum, then it may be weird if a future gets sent out and received in the same tick/stratum run
— Reply to this email directly, view it on GitHub https://github.com/hydro-project/hydroflow/issues/1183#issuecomment-2121124452, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAC54QJSLVJ5D65C2IMHNBDZDJJ3ZAVCNFSM6AAAAABHJMHLRGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMRRGEZDINBVGI . You are receiving this because you were assigned.Message ID: @.***>
Current plan is to remove from 0.8 release and redesign for 0.9