Include transaction details for errors which are reported during an async span
Is your feature request related to a problem? Please describe.
Context: we are running a Rails application using ElasticAPM (see versions below)
We spawn multiple threads during an individual request into our system to do various units of work (mostly making asynchronous HTTP requests). For the execution of those threads we wrap the block in ElasticAPM.with_span and set the parent to be the current_transaction from the main thread. For the most part, this seems to work as expected in that we see the single transaction and the async spans show up in APM in Kibana. Where it seems to fall down is if we report an error during the execution of one of those threads. While ElasticAPM.current_span is correct and references the span we created by calling with_span, ElasticAPM.current_transaction is nil. When ElasticAPM.current_transaction is nil the report method will not add transactional context. Custom context does not work either.
My inclination is that given how async spans work (they have a reference to their parent as well as the overarching transaction) and that ElasticAPM.current_transaction will be nil on those threads (as this is stored in thread local storage in Instrumenter and so a new thread does not have one) that it would be fair for the ErrorBuilder to attempt to pull a transaction from first ElasticAPM.current_transaction (as it does today) and then from ElasticAPM.current_span&.transaction. It feels like this aligns with how async spans should be working and would allow error reporting from threads which have started an async span to have context, but I am most likely missing something!
Describe the solution you'd like
Either via the mechanism described above or some other way, we would like the ability to report errors on threads that can include transactional and custom context.
Describe alternatives you've considered
We attempted to set the current_transaction at the beginning of our Thread (before calling with_span) by digging into the instrumenter:
ElasticAPM.agent.instrumenter.current_transaction = ElasticAPM.current_transaction
however this caused all sorts of problems.
Our current alternative would be to not have the errors correlate back to the transaction, though this feels as if we are not getting the most out of our ELK/APM stack.
Additional context
rails 5.2.4.5 elastic-apm 3.5.0 (also tried on 3.13.0)
Hi @bscharm! Thank you for bringing up this shortcoming of reporting errors with async spans and for the detailed description. From a high level, the solution you proposed sounds reasonable. I'll take a look and have an update soon!
Hi again @bscharm I opened a PR (#975) to fix this. I have one outstanding question, which you can see in the PR description. Perhaps you have an opinion on it that you'd like to share? In the meantime, if you could test the PR and let me know if it resolves the issue, that'd be great. Thank you!
@estolfo I commented on the PR, but after some further investigation we are finding the behavior of the spans / threads is still pretty inconsistent. If we use with_span on our threads, we always lose any spans from the main thread and also don't always see all spans that should exist from the spawned threads. When the with_span block ends, there are sometimes multiple spans still on the current thread and they don't all end and thus don't show up in Kibana. The only way we capture all spans is if we turn off threading completely. We tried some hacky code that looped calling end_span at the end of the thread and that seemed to work for seeing all the spans from the threads, but their timing was inaccurate and we still weren't seeing main thread spans.
Hi @bscharm thanks for this update and trying out the PR. I'll do some more testing and have more information next week.