hydroflow Diagnostics: ops need some ergonomic way of generating flow of debug info (e.g. flag when all work is done)

Diagnostics: ops need some ergonomic way of generating flow of debug info (e.g. flag when all work is done)

Open zzlk opened this issue 1 year ago • 3 comments

It is very common in imperative code to perform some work and wrap it in wall-clock timers or event logging so that they can accurately keep track of when a request is finished processing. This is difficult in hydroflow to do completely correctly. Usually what will happen is that they will emit some event slightly before the request has been completely processed for example:

req -> inspect(|x| println!("request finished processing")) -> join<'static>()

but this is not completely correct because the event is emitted slightly before all the 'work' has been done.

Another way to do this would be to use run_tick() and emit once run_tick() finishes but this is not very ergonomic and does not cooperate well with async.

Jul 12 '23 20:07 zzlk

Seems like we need to carefully think over what metrics we want. Then we can decide how to implement/surface these. Could be internal to ops, could be plumb-able in the dataflow? If internal to ops would need to be some uniform API across all ops or it's too ugly. This merits a design doc and review before implementation. Then might need to be documented carefully in the book under "performance debugging" or the like.

Jul 21 '23 18:07 jhellerstein

Generally would be nice to have an operator API for DEBUG/LOG stream out of an operator. As regards "work is done", this seems operator-specific.

One possible API:

   chain = op1() -> op2();
   chain.debug() -> inspect(...)

Aug 14 '23 16:08 jhellerstein

For 'work is done' I did this change https://github.com/hydro-project/hydroflow/pull/887 and I think that is good for the 'work is done' use case which was the primary driver of the demand for this.

Oct 16 '23 16:10 zzlk

Currently we have a recipe for compiling and generating flamegraphs that captures all the operator paths (we think). But it aggregates that across all calls. Perhaps we'll decide that + inspect is adequate, but some design requirements would probably be good here. Maybe that can be driven by users like @davidchuyaya.

May 20 '24 16:05 jhellerstein

hydroflow hydroflow copied to clipboard

Diagnostics: ops need some ergonomic way of generating flow of debug info (e.g. flag when all work is done)

hydroflow
hydroflow copied to clipboard