dd-trace-java
dd-trace-java copied to clipboard
Losing Jetty 10 servlet traces and metrics in 1.19.0+
We recently encountered a hard-to-reproduce issue in tracing and tracing metrics after upgrading our agent to 1.19+. Everything would start out fine, but after about an hour, we'd stop getting traces from our GRPC endpoints (e.g. servlet.request /domain/method) along with the corresponding trace metrics.
- Application: https://github.com/cashapp/misk based Kotlin web service (it uses Jetty, hence
servlet.request) - Environment:
eclipse-temurin:17-jammydocker image on X86
As far as we can tell, this doesn't happen with 1.18.2 or 1.18.3, but happens with both 1.19.0 and 1.19.1.
@Hexcles :wave:
- What's the version of Misk and Jetty?
- Is there any error in the logs?
- Which configuration options are you passing to the tracer?
If you contact support, we can take a look and speed up troubleshooting.
- Misk is head of master; jetty is v10.0.15
- Nothing in Java logs; I'll check agent logs
-Dtrace.Status404Rule.enabled=false -Ddd.sqs.propagation.enabled=false -Ddd.sqs.legacy.tracing.enabled=true
I'll try to reproduce again and file a support ticket. What information would you need btw? Flare?
Here's the content of -Ddatadog.slf4j.simpleLogger.logFile:
[dd.trace 2023-09-07 21:39:26:461 +0000] [dd-task-scheduler] INFO datadog.trace.agent.core.StatusLogger - DATADOG TRACER CONFIGURATION {"version":"1.20.1~70cd67ce90","os_name":"Linux","os_version":"5.10.184-175.731.amzn2.x86_64","architecture":"amd64","lang":"jvm","lang_version":"17.0.8.1","jvm_vendor":"Eclipse Adoptium","jvm_version":"17.0.8.1+1","java_class_version":"61.0","http_nonProxyHosts":"null","http_proxyHost":"null","enabled":true,"service":"<redacted>","agent_url":"<redacted>","agent_unix_domain_socket":"/var/run/datadog/apm.socket","agent_error":false,"debug":false,"trace_propagation_style_extract":["datadog"],"trace_propagation_style_inject":["datadog"],"analytics_enabled":false,"sampling_rules":[{},{}],"priority_sampling_enabled":true,"logs_correlation_enabled":true,"profiling_enabled":true,"remote_config_enabled":true,"debugger_enabled":false,"appsec_enabled":"ENABLED_INACTIVE","telemetry_enabled":true,"dd_version":"240f05dfa15915f7b8b2882eb0443fbe60872b26","health_checks_enabled":true,"configuration_file":"no config file present","runtime_id":"<redacted>","logging_settings":{"levelInBrackets":false,"dateTimeFormat":"'[dd.trace 'yyyy-MM-dd HH:mm:ss:SSS Z']'","logFile":"/var/log/datadog.log","configurationFile":"simplelogger.properties","showShortLogName":false,"showDateTime":true,"showLogName":true,"showThreadName":true,"defaultLogLevel":"INFO","warnLevelString":"WARN","embedException":false},"cws_enabled":false,"cws_tls_refresh":5000,"datadog_profiler_enabled":true,"datadog_profiler_safe":true,"datadog_profiler_enabled_overridden":false}
In other words, it is still present in 1.21.1. Filing a support ticket, too (1334466).
@Hexcles Thank you. We'll look into this. We might ask for further information through the support ticket if we need it.
Just for the record, this is an issue in Jetty 10 instrumentation introduced in dd-trace-java v1.19.0. The workaround is setting -Ddd.integration.jetty.enabled=false (system property) or DD_INTEGRATION_JETTY_ENABLED=false (environment variable), which will fallback to the generic servlet instrumentation and should result in the same behavior for Jetty 10 as dd-trace-java releases previous to v1.19.0.
Hey, is there any update on this issue? it is impacting us as well (I verified DD_INTEGRATION_JETTY_ENABLED workaround works, is there any downside for using it?)