dd-trace-java icon indicating copy to clipboard operation
dd-trace-java copied to clipboard

Latest java agent 1.35.1 and .2 breaks APM tracing

Open kashyap-parikh-ah opened this issue 1 year ago • 6 comments

Following this direction https://docs.datadoghq.com/tracing/trace_collection/automatic_instrumentation/dd_libraries/java/?tab=curl

It pulls latest version of dd-java-agent.jar

Version 1.35.1 breaks all traces.

kashyap-parikh-ah avatar Jun 19 '24 03:06 kashyap-parikh-ah

Hi @kashyap-parikh-ah can you provide more information, for example tracer debug logs for the previous version you were using and 1.35.2. You can provide these via a support ticket. It would be especially useful to compare the startup part of the debug logs for the previous version compared to 1.35.2

It would be also useful to know:

  • What kind of application are you attaching the tracer to (library versions etc.)
  • Full version and vendor string of the JVM you are using
  • How are you attaching the Java tracer, single step or using -javaagent ?
  • What traces did you get with the previous version that you don't with v1.35.2?
  • How are traces broken - is it you don't get traces at all or something else?

mcculls avatar Jun 19 '24 06:06 mcculls

Hey @mcculls we opened up a ticket #1747501. We provided some of the details requested in there. For the tracer debug logs, we'll have to find a couple services to turn it on.

As for high level, we instrument using -javaagent, and once they were on 1.35.2, we saw 0 APM metrics/traces in Datadog. We pinned 1.35.0 and they showed up again.

All our discussions moving forward will be in the ticket.

tantran-earnin avatar Jun 20 '24 21:06 tantran-earnin

Thanks @tantran-earnin - having debug logs for the same service before and after the upgrade will help us identify if there were any differences in which types got transformed, as well as if there were any unexpected issues during startup.

So far we haven't identified anything in that release which could result in a complete loss of metrics/traces - the debug logs will also help us identify if the tracer is producing the same spans as before, which might then point towards a communication issue between the tracer and the agent.

mcculls avatar Jun 20 '24 22:06 mcculls

Also can you see if the issue goes away when you use the same version for dd-trace-api and the Java tracer?

mcculls avatar Jun 20 '24 22:06 mcculls

Matching the dd-trace-api with dd-java-agent didn't seem to fix. Will try to set in trace debug mode for both 1.35.0 and 1.35.2.

tantran-earnin avatar Jun 21 '24 00:06 tantran-earnin

I've reverted matching of the dd-trace-api. I redeployed both 1.35.0 and 1.35.2 with debug tracer logs. Links to the logs are noted in the ticket.

tantran-earnin avatar Jun 21 '24 01:06 tantran-earnin

Answered via support ticket 1747501

mcculls avatar Jul 04 '24 10:07 mcculls

Can you share the solution here? We are seeing exactly the same thing

snuderl avatar Aug 19 '24 13:08 snuderl

Hi @snuderl - this was resolved by updating the configuration.

Versions 1.35.x and later started to honour various OpenTelemetry settings to help migration. One of these settings was OTEL_TRACES_EXPORTER=none which tells the tracer not to send traces. This is mapped to its Datadog equivalent, DD_TRACE_ENABLED=false.

If you are setting OTEL_TRACES_EXPORTER=none in your environment, but still want to send Datadog traces then you'll need to add DD_TRACE_ENABLED=true to your environment to override the mapped OpenTelemetry setting.

If you don't have OTEL_TRACES_EXPORTER=none in your environment then you are encountering a different issue and should open a new ticket through support.

mcculls avatar Aug 19 '24 13:08 mcculls

Yeah this is exactly what is happening to us. Thanks!

snuderl avatar Aug 19 '24 16:08 snuderl