apm-agent-java
apm-agent-java copied to clipboard
Inferred spans from loop don't seem to correlate correctly
Recently to investigate tail latencies for our knn
search in Elasticsearch, we've added the capability to profile the path of execution in a search transaction, yielding traces that look like this:
What's strange is the termination of HnswGraphSearcher#search
with the first iteration of searchLevel
. See the expected code here:
https://github.com/apache/lucene/blob/728936eb7cbb91bfa624686007b02b086d10cf41/lucene/core/src/java/org/apache/lucene/util/hnsw/HnswGraphSearcher.java#L155-L157
I would expect rather something that looks like this:
We see in the stack trace info for each subsequent searchLevel
invocation that HnswGraphSearcher#search
is detected, but for some reason it's not visualized:
Hey @DJRickyB , this indeed looks incorrect.
Just to be sure: All of the spans shown in the screenshot are inferred, correct? E.g. none are instrumented via the trace_methods
configuration?
Could you maybe also provide the JSON-documents of the spans and transaction from your example trace as a gist. This might help us investigating what's going on here.
Hi @DJRickyB ,
There are two main options to help us debug this issue:
- capture the sampling profiler files and send them back to us for investigation
- provide a reproducible way to trigger this issue
In this case the first option seems the easiest to achieve short term, thus I would suggest to do the following:
- add
profiling_inferred_spans_backup_diagnostic_files=true
to the agent configuration - restart ES and execute the transaction once
- send us the content of the
/tmp/profiler
folder
Hi @DJRickyB, are you able to provide these or if not, is there a technical issue we can help with?
Thanks all, I am sorry I have not been able to circle back yet with the requested information. I intend to work on this the week of Nov 28-Dec 2 and will retrieve diagnostics then