hydroflow
hydroflow copied to clipboard
Diagnostics: ops need some ergonomic way of generating flow of debug info (e.g. flag when all work is done)
It is very common in imperative code to perform some work and wrap it in wall-clock timers or event logging so that they can accurately keep track of when a request is finished processing. This is difficult in hydroflow to do completely correctly. Usually what will happen is that they will emit some event slightly before the request has been completely processed for example:
req -> inspect(|x| println!("request finished processing")) -> join<'static>()
but this is not completely correct because the event is emitted slightly before all the 'work' has been done.
Another way to do this would be to use run_tick() and emit once run_tick() finishes but this is not very ergonomic and does not cooperate well with async.
Seems like we need to carefully think over what metrics we want. Then we can decide how to implement/surface these. Could be internal to ops, could be plumb-able in the dataflow? If internal to ops would need to be some uniform API across all ops or it's too ugly. This merits a design doc and review before implementation. Then might need to be documented carefully in the book under "performance debugging" or the like.
Generally would be nice to have an operator API for DEBUG/LOG stream out of an operator. As regards "work is done", this seems operator-specific.
One possible API:
chain = op1() -> op2();
chain.debug() -> inspect(...)
For 'work is done' I did this change https://github.com/hydro-project/hydroflow/pull/887 and I think that is good for the 'work is done' use case which was the primary driver of the demand for this.
Currently we have a recipe for compiling and generating flamegraphs that captures all the operator paths (we think). But it aggregates that across all calls. Perhaps we'll decide that + inspect
is adequate, but some design requirements would probably be good here. Maybe that can be driven by users like @davidchuyaya.