apm-agent-java icon indicating copy to clipboard operation
apm-agent-java copied to clipboard

Inferred spans from loop don't seem to correlate correctly

Open DJRickyB opened this issue 2 years ago • 4 comments

Recently to investigate tail latencies for our knn search in Elasticsearch, we've added the capability to profile the path of execution in a search transaction, yielding traces that look like this: image

What's strange is the termination of HnswGraphSearcher#search with the first iteration of searchLevel. See the expected code here: https://github.com/apache/lucene/blob/728936eb7cbb91bfa624686007b02b086d10cf41/lucene/core/src/java/org/apache/lucene/util/hnsw/HnswGraphSearcher.java#L155-L157

I would expect rather something that looks like this: image

We see in the stack trace info for each subsequent searchLevel invocation that HnswGraphSearcher#search is detected, but for some reason it's not visualized: image

DJRickyB avatar Oct 10 '22 14:10 DJRickyB

Hey @DJRickyB , this indeed looks incorrect.

Just to be sure: All of the spans shown in the screenshot are inferred, correct? E.g. none are instrumented via the trace_methods configuration?

Could you maybe also provide the JSON-documents of the spans and transaction from your example trace as a gist. This might help us investigating what's going on here.

JonasKunz avatar Oct 11 '22 08:10 JonasKunz

Hi @DJRickyB ,

There are two main options to help us debug this issue:

  1. capture the sampling profiler files and send them back to us for investigation
  2. provide a reproducible way to trigger this issue

In this case the first option seems the easiest to achieve short term, thus I would suggest to do the following:

  • add profiling_inferred_spans_backup_diagnostic_files=true to the agent configuration
  • restart ES and execute the transaction once
  • send us the content of the /tmp/profiler folder

SylvainJuge avatar Oct 18 '22 12:10 SylvainJuge

Hi @DJRickyB, are you able to provide these or if not, is there a technical issue we can help with?

jackshirazi avatar Nov 14 '22 15:11 jackshirazi

Thanks all, I am sorry I have not been able to circle back yet with the requested information. I intend to work on this the week of Nov 28-Dec 2 and will retrieve diagnostics then

DJRickyB avatar Nov 14 '22 16:11 DJRickyB