xtra icon indicating copy to clipboard operation
xtra copied to clipboard

Way to mark time spent in queue and handler as follows_from request span

Open Restioson opened this issue 3 years ago • 4 comments

When split_receiver is used and the receiver is discarded, the xtra_actor_request span will still include the message handling time, which will lead to the containing span being longer than it should actually be in some cases. This is because the work is done somewhere else, and, unlike without split_receiver, is not waited for. Therefore, there needs to be some way to disconnect the handler and waiting time in queue from the xtra_actor_request span.

Consider the following behaviour under master:

#[instrument]
async fn do_something_quickly(addr: Address<A>) {
    let _ = addr.send(LongMessage).split_receiver().await;
    tracing::info!("Done!");
}

"Done!" will print, and then the do_something_quickly span may live long past that, since it will only end when the grandchild xtra_message_handler span for LongMessage finishes too.

Ideally, this is would be the easiest and most natural thing to do in the case of let _ = addr.send(..).split_receiver().await;. However, some subscribers (e.g Jaeger, Zipkin, and Grafana Tempo[^1]) do not support back-referencing spans linked to a given would-be parent (i.e one which the other spans follow from but are not children of). They only show this link on the would-be child span. This can make it hard to navigate traces, so in some instances it can be intended to make these spans children, even though they are semantically not, as a compromise in the favour of explorability.


[^1]: I'm pretty sure Grafana Tempo just re-exports Jaeger's view, though I don't know for certain.

Restioson avatar Jul 18 '22 11:07 Restioson

I don't think I understand enough about spans to help here but if somehow possible, I'd be great to have a failing test case that we can work towards fixing.

thomaseizinger avatar Oct 28 '22 02:10 thomaseizinger

Maybe we should postpone this to 0.7 too?

Restioson avatar Jan 31 '23 08:01 Restioson

I certainly don't know how to fix it :)

thomaseizinger avatar Feb 02 '23 01:02 thomaseizinger

Postponing to 0.7 as it is unclear how this should be handled. I know the principled solution here, but I feel like it is not that useful, as I've mentioned how this can lead to weird behaviour and degraded debuggability in the consuming software (grafana tempo, zipkin, jaeger, etc) after playing around with some different solutions on a real application.

Restioson avatar Feb 15 '23 11:02 Restioson