dd-trace-java icon indicating copy to clipboard operation
dd-trace-java copied to clipboard

Losing Jetty 10 servlet traces and metrics in 1.19.0+

Open Hexcles opened this issue 2 years ago • 7 comments

We recently encountered a hard-to-reproduce issue in tracing and tracing metrics after upgrading our agent to 1.19+. Everything would start out fine, but after about an hour, we'd stop getting traces from our GRPC endpoints (e.g. servlet.request /domain/method) along with the corresponding trace metrics.

  • Application: https://github.com/cashapp/misk based Kotlin web service (it uses Jetty, hence servlet.request)
  • Environment: eclipse-temurin:17-jammy docker image on X86

As far as we can tell, this doesn't happen with 1.18.2 or 1.18.3, but happens with both 1.19.0 and 1.19.1.

Hexcles avatar Aug 17 '23 00:08 Hexcles

@Hexcles :wave:

  • What's the version of Misk and Jetty?
  • Is there any error in the logs?
  • Which configuration options are you passing to the tracer?

If you contact support, we can take a look and speed up troubleshooting.

smola avatar Sep 07 '23 12:09 smola

  • Misk is head of master; jetty is v10.0.15
  • Nothing in Java logs; I'll check agent logs
  • -Dtrace.Status404Rule.enabled=false -Ddd.sqs.propagation.enabled=false -Ddd.sqs.legacy.tracing.enabled=true

I'll try to reproduce again and file a support ticket. What information would you need btw? Flare?

Hexcles avatar Sep 07 '23 13:09 Hexcles

Here's the content of -Ddatadog.slf4j.simpleLogger.logFile:

[dd.trace 2023-09-07 21:39:26:461 +0000] [dd-task-scheduler] INFO datadog.trace.agent.core.StatusLogger - DATADOG TRACER CONFIGURATION {"version":"1.20.1~70cd67ce90","os_name":"Linux","os_version":"5.10.184-175.731.amzn2.x86_64","architecture":"amd64","lang":"jvm","lang_version":"17.0.8.1","jvm_vendor":"Eclipse Adoptium","jvm_version":"17.0.8.1+1","java_class_version":"61.0","http_nonProxyHosts":"null","http_proxyHost":"null","enabled":true,"service":"<redacted>","agent_url":"<redacted>","agent_unix_domain_socket":"/var/run/datadog/apm.socket","agent_error":false,"debug":false,"trace_propagation_style_extract":["datadog"],"trace_propagation_style_inject":["datadog"],"analytics_enabled":false,"sampling_rules":[{},{}],"priority_sampling_enabled":true,"logs_correlation_enabled":true,"profiling_enabled":true,"remote_config_enabled":true,"debugger_enabled":false,"appsec_enabled":"ENABLED_INACTIVE","telemetry_enabled":true,"dd_version":"240f05dfa15915f7b8b2882eb0443fbe60872b26","health_checks_enabled":true,"configuration_file":"no config file present","runtime_id":"<redacted>","logging_settings":{"levelInBrackets":false,"dateTimeFormat":"'[dd.trace 'yyyy-MM-dd HH:mm:ss:SSS Z']'","logFile":"/var/log/datadog.log","configurationFile":"simplelogger.properties","showShortLogName":false,"showDateTime":true,"showLogName":true,"showThreadName":true,"defaultLogLevel":"INFO","warnLevelString":"WARN","embedException":false},"cws_enabled":false,"cws_tls_refresh":5000,"datadog_profiler_enabled":true,"datadog_profiler_safe":true,"datadog_profiler_enabled_overridden":false}

Hexcles avatar Sep 07 '23 21:09 Hexcles

In other words, it is still present in 1.21.1. Filing a support ticket, too (1334466).

Hexcles avatar Sep 07 '23 21:09 Hexcles

@Hexcles Thank you. We'll look into this. We might ask for further information through the support ticket if we need it.

smola avatar Sep 08 '23 09:09 smola

Just for the record, this is an issue in Jetty 10 instrumentation introduced in dd-trace-java v1.19.0. The workaround is setting -Ddd.integration.jetty.enabled=false (system property) or DD_INTEGRATION_JETTY_ENABLED=false (environment variable), which will fallback to the generic servlet instrumentation and should result in the same behavior for Jetty 10 as dd-trace-java releases previous to v1.19.0.

smola avatar Sep 20 '23 07:09 smola

Hey, is there any update on this issue? it is impacting us as well (I verified DD_INTEGRATION_JETTY_ENABLED workaround works, is there any downside for using it?)

damar-block avatar Nov 30 '23 21:11 damar-block