apm-agent-ruby icon indicating copy to clipboard operation
apm-agent-ruby copied to clipboard

Include transaction details for errors which are reported during an async span

Open bscharm opened this issue 4 years ago • 4 comments

Is your feature request related to a problem? Please describe.

Context: we are running a Rails application using ElasticAPM (see versions below)

We spawn multiple threads during an individual request into our system to do various units of work (mostly making asynchronous HTTP requests). For the execution of those threads we wrap the block in ElasticAPM.with_span and set the parent to be the current_transaction from the main thread. For the most part, this seems to work as expected in that we see the single transaction and the async spans show up in APM in Kibana. Where it seems to fall down is if we report an error during the execution of one of those threads. While ElasticAPM.current_span is correct and references the span we created by calling with_span, ElasticAPM.current_transaction is nil. When ElasticAPM.current_transaction is nil the report method will not add transactional context. Custom context does not work either.

My inclination is that given how async spans work (they have a reference to their parent as well as the overarching transaction) and that ElasticAPM.current_transaction will be nil on those threads (as this is stored in thread local storage in Instrumenter and so a new thread does not have one) that it would be fair for the ErrorBuilder to attempt to pull a transaction from first ElasticAPM.current_transaction (as it does today) and then from ElasticAPM.current_span&.transaction. It feels like this aligns with how async spans should be working and would allow error reporting from threads which have started an async span to have context, but I am most likely missing something!

Describe the solution you'd like

Either via the mechanism described above or some other way, we would like the ability to report errors on threads that can include transactional and custom context.

Describe alternatives you've considered

We attempted to set the current_transaction at the beginning of our Thread (before calling with_span) by digging into the instrumenter:

ElasticAPM.agent.instrumenter.current_transaction = ElasticAPM.current_transaction

however this caused all sorts of problems.

Our current alternative would be to not have the errors correlate back to the transaction, though this feels as if we are not getting the most out of our ELK/APM stack.

Additional context

rails 5.2.4.5 elastic-apm 3.5.0 (also tried on 3.13.0)

bscharm avatar Feb 25 '21 01:02 bscharm

Hi @bscharm! Thank you for bringing up this shortcoming of reporting errors with async spans and for the detailed description. From a high level, the solution you proposed sounds reasonable. I'll take a look and have an update soon!

estolfo avatar Feb 25 '21 13:02 estolfo

Hi again @bscharm I opened a PR (#975) to fix this. I have one outstanding question, which you can see in the PR description. Perhaps you have an opinion on it that you'd like to share? In the meantime, if you could test the PR and let me know if it resolves the issue, that'd be great. Thank you!

estolfo avatar Feb 25 '21 16:02 estolfo

@estolfo I commented on the PR, but after some further investigation we are finding the behavior of the spans / threads is still pretty inconsistent. If we use with_span on our threads, we always lose any spans from the main thread and also don't always see all spans that should exist from the spawned threads. When the with_span block ends, there are sometimes multiple spans still on the current thread and they don't all end and thus don't show up in Kibana. The only way we capture all spans is if we turn off threading completely. We tried some hacky code that looped calling end_span at the end of the thread and that seemed to work for seeing all the spans from the threads, but their timing was inaccurate and we still weren't seeing main thread spans.

bscharm avatar Feb 25 '21 22:02 bscharm

Hi @bscharm thanks for this update and trying out the PR. I'll do some more testing and have more information next week.

estolfo avatar Feb 26 '21 09:02 estolfo