dd-trace-java
dd-trace-java copied to clipboard
Unable to disable tracing via env var, and connection times out after 30 seconds rather than default
Hello, I'm having a couple of issues with the Java trace API.
- When setting the environment variable
DD_TRACE_ENABLED
to false, the tracing library still attempts to initiate a connection to an agent endpoint. In the event it cannot connect, this will take up to the full timeout to exit, whereas I'd expect it to disable all tracing.
user@host:~$ export DD_TRACE_ENABLED=false
user@host:~$ time <long java start command> -pdd -Ddd.trace.health.metrics.enabled=true -Ddd.agent.host=localhost java/lib/hellojava.jar
OpenJDK 64-Bit Server VM warning: Sharing is only supported for boot loader classes because bootstrap classpath has been appended
[dd.trace 2024-09-10 20:18:39:649 +0000] [main] DEBUG datadog.trace.agent.tooling.AgentInstaller - Adding filtered classes - inst
rumentation.class=datadog.trace.instrumentation.akka.concurrent.AkkaForkJoinTaskInstrumentation
[dd.trace 2024-09-10 20:18:39:652 +0000] [main] DEBUG datadog.trace.agent.tooling.AgentInstaller - Adding filtered classes - inst
rumentation.class=datadog.trace.instrumentation.akka.concurrent.AkkaMailboxInstrumentation
[dd.trace 2024-09-10 20:18:39:686 +0000] [main] DEBUG datadog.trace.agent.tooling.AgentInstaller - Adding filtered classes - inst
rumentation.class=datadog.trace.instrumentation.googlepubsub.PublisherInstrumentation
[dd.trace 2024-09-10 20:18:39:716 +0000] [main] DEBUG datadog.trace.agent.tooling.AgentInstaller - Adding filtered classes - inst
rumentation.class=datadog.trace.instrumentation.java.concurrent.JavaForkJoinTaskInstrumentation
...
[dd.trace 2024-09-10 20:19:23:055 +0000] [main] DEBUG datadog.trace.agent.tooling.TracerInstaller - Tracing is disabled, not installing GlobalTracer.
...
[dd.trace 2024-09-10 20:19:23:389 +0000] [main] EXCLUDE_TELEMETRY datadog.communication.ddagent.DDAgentFeaturesDiscovery - Error querying info at http://localhost:8126/
real 0m1.994s
user 0m2.931s
sys 0m0.271s
- After disabling the agent running on localhost, when I set the
agent.host
to localhost the connection fails within less than 3 seconds and exits:
user@host:~$ time <long java start command> -pdd -Ddd.trace.health.metrics.enabled=true -Ddd.agent.host=localhost java/lib/hellojava.jar
OpenJDK 64-Bit Server VM warning: Sharing is only supported for boot loader classes because bootstrap classpath has been appended
[dd.trace 2024-09-10 20:14:32:022 +0000] [dd-telemetry] WARN datadog.telemetry.TelemetryRouter - Got FAILURE sending telemetry request to http://localhost:8126/telemetry/proxy/api/v2/apmtelemetry.
real 0m2.638s
user 0m4.009s
sys 0m0.334s
However, if I set it to a host that does not exist over the ethernet device, it will attempt to connect and take over 30 seconds to fail:
user@host:~$ time <long java start command> -pdd -Ddd.trace.health.metrics.enabled=true -Ddd.agent.host=this-host-doesnt-exist.com java/lib/hellojava.jar
OpenJDK 64-Bit Server VM warning: Sharing is only supported for boot loader classes because bootstrap classpath has been appended
[dd.trace 2024-09-06 19:33:57:476 +0000] [dd-task-scheduler] INFO datadog.trace.agent.core.StatusLogger - DATADOG TRACER CONFIGURATION {"version":"1.35.0~97065ed022","os_name":"Linux","os_version":"6.1.85-ts1-amd64","architecture":"amd64","lang":"jvm","lang_version":"21.0.2-tsjgss","jvm_vendor":"N/A","jvm_version":"21.0.2-tsjgss","java_class_version":"65.0","http_nonProxyHosts":"null","http_proxyHost":"null","enabled":true,"service":"hellojava","agent_url":"http://this-host-doesnt-exist.com:8126","agent_error":true,"debug":false,"trace_propagation_style_extract":["datadog","tracecontext"],"trace_propagation_style_inject":["datadog","tracecontext"],"analytics_enabled":false,"priority_sampling_enabled":true,"logs_correlation_enabled":true,"profiling_enabled":false,"remote_config_enabled":true,"debugger_enabled":false,"debugger_exception_enabled":false,"appsec_enabled":"ENABLED_INACTIVE","telemetry_enabled":true,"telemetry_dependency_collection_enabled":true,"telemetry_log_collection_enabled":false,"dd_version":"","health_checks_enabled":true,"configuration_file":"no config file present","runtime_id":"1f517376-f108-4012-a0ef-ea1940f110f3","logging_settings":{"levelInBrackets":false,"dateTimeFormat":"'[dd.trace 'yyyy-MM-dd HH:mm:ss:SSS Z']'","logFile":"System.err","configurationFile":"simplelogger.properties","showShortLogName":false,"showDateTime":true,"showLogName":true,"showThreadName":true,"defaultLogLevel":"INFO","warnLevelString":"WARN","embedException":false},"cws_enabled":false,"cws_tls_refresh":5000,"datadog_profiler_enabled":false,"datadog_profiler_safe":true,"datadog_profiler_enabled_overridden":false,"data_streams_enabled":false}
[dd.trace 2024-09-06 19:34:01:688 +0000] [main] WARN datadog.telemetry.TelemetrySystem - Telemetry thread join was not completed
real 0m38.012s
user 0m5.508s
sys 0m0.353s
I did a network capture that shows the localhost sent back an RST immediately:
21 1.761860 127.0.0.1 → 127.0.0.1 TCP 74 40270 → 8126 [SYN] Seq=0 Win=65535 Len=0 MSS=65495 SACK_PERM=1 TSval=2013823740 TSecr=0 WS=2048
22 1.761878 127.0.0.1 → 127.0.0.1 TCP 54 8126 → 40270 [RST, ACK] Seq=1 Ack=1 Win=0 Len=0
But to the agent over the network, it went into a TCP retransmission loop (the host I am attempting to connect to lives behind several layers of switches and firewalls, and does not send back an RST):
251 16.514437 x.x.x.x → y.y.y.y TCP 74 44472 → 8126 [SYN] Seq=0 Win=65535 Len=0 MSS=1460 SACK_PERM=1 TSval=238250723 TSecr=0 WS=2048
266 17.550506 x.x.x.x → y.y.y.y TCP 74 [TCP Retransmission] 44472 → 8126 [SYN] Seq=0 Win=65535 Len=0 MSS=1460 SACK_PERM=1 TSval=238251760 TSecr=0 WS=2048
396 19.598486 x.x.x.x → y.y.y.y TCP 74 [TCP Retransmission] 44472 → 8126 [SYN] Seq=0 Win=65535 Len=0 MSS=1460 SACK_PERM=1 TSval=238253808 TSecr=0 WS=2048
428 23.630480 x.x.x.x → y.y.y.y TCP 74 [TCP Retransmission] 44472 → 8126 [SYN] Seq=0 Win=65535 Len=0 MSS=1460 SACK_PERM=1 TSval=238257840 TSecr=0 WS=2048
634 26.525267 x.x.x.x → y.y.y.y TCP 74 40424 → 8126 [SYN] Seq=0 Win=65535 Len=0 MSS=1460 SACK_PERM=1 TSval=238260734 TSecr=0 WS=2048
645 27.534484 x.x.x.x → y.y.y.y TCP 74 [TCP Retransmission] 40424 → 8126 [SYN] Seq=0 Win=65535 Len=0 MSS=1460 SACK_PERM=1 TSval=238261744 TSecr=0 WS=2048
707 29.582488 x.x.x.x → y.y.y.y TCP 74 [TCP Retransmission] 40424 → 8126 [SYN] Seq=0 Win=65535 Len=0 MSS=1460 SACK_PERM=1 TSval=238263792 TSecr=0 WS=2048
742 33.614484 x.x.x.x → y.y.y.y TCP 74 [TCP Retransmission] 40424 → 8126 [SYN] Seq=0 Win=65535 Len=0 MSS=1460 SACK_PERM=1 TSval=238267824 TSecr=0 WS=2048
766 36.527430 x.x.x.x → y.y.y.y TCP 74 33904 → 8126 [SYN] Seq=0 Win=65535 Len=0 MSS=1460 SACK_PERM=1 TSval=238270736 TSecr=0 WS=2048
786 37.582481 x.x.x.x → y.y.y.y TCP 74 [TCP Retransmission] 33904 → 8126 [SYN] Seq=0 Win=65535 Len=0 MSS=1460 SACK_PERM=1 TSval=238271792 TSecr=0 WS=2048
815 39.630481 x.x.x.x → y.y.y.y TCP 74 [TCP Retransmission] 33904 → 8126 [SYN] Seq=0 Win=65535 Len=0 MSS=1460 SACK_PERM=1 TSval=238273840 TSecr=0 WS=2048
855 43.662508 x.x.x.x → y.y.y.y TCP 74 [TCP Retransmission] 33904 → 8126 [SYN] Seq=0 Win=65535 Len=0 MSS=1460 SACK_PERM=1 TSval=238277872 TSecr=0 WS=2048
890 46.899895 x.x.x.x → y.y.y.y TCP 74 41110 → 8126 [SYN] Seq=0 Win=65535 Len=0 MSS=1460 SACK_PERM=1 TSval=238281109 TSecr=0 WS=2048
893 47.086992 x.x.x.x → y.y.y.y TCP 74 41120 → 8126 [SYN] Seq=0 Win=65535 Len=0 MSS=1460 SACK_PERM=1 TSval=238281296 TSecr=0 WS=2048
894 47.359935 x.x.x.x → y.y.y.y UDP 305 59395 → 8125 Len=263
904 47.950505 x.x.x.x → y.y.y.y TCP 74 [TCP Retransmission] 41110 → 8126 [SYN] Seq=0 Win=65535 Len=0 MSS=1460 SACK_PERM=1 TSval=238282160 TSecr=0 WS=2048
926 48.766598 x.x.x.x → y.y.y.y UDP 247 59395 → 8125 Len=205
927 48.766633 x.x.x.x → y.y.y.y UDP 249 59395 → 8125 Len=207
928 48.767289 x.x.x.x → y.y.y.y UDP 268 59395 → 8125 Len=226
929 48.767567 x.x.x.x → y.y.y.y UDP 249 59395 → 8125 Len=207
930 48.768114 x.x.x.x → y.y.y.y UDP 259 59395 → 8125 Len=217
931 48.768306 x.x.x.x → y.y.y.y UDP 268 59395 → 8125 Len=226
932 48.768883 x.x.x.x → y.y.y.y UDP 267 59395 → 8125 Len=225
933 48.769061 x.x.x.x → y.y.y.y UDP 266 59395 → 8125 Len=224
934 48.769228 x.x.x.x → y.y.y.y UDP 264 59395 → 8125 Len=222
935 48.769381 x.x.x.x → y.y.y.y UDP 263 59395 → 8125 Len=221
936 48.769545 x.x.x.x → y.y.y.y UDP 243 59395 → 8125 Len=201
937 48.769713 x.x.x.x → y.y.y.y UDP 267 59395 → 8125 Len=225
938 48.769914 x.x.x.x → y.y.y.y UDP 245 59395 → 8125 Len=203
939 48.770073 x.x.x.x → y.y.y.y UDP 253 59395 → 8125 Len=211
940 48.770220 x.x.x.x → y.y.y.y UDP 241 59395 → 8125 Len=199
941 48.770366 x.x.x.x → y.y.y.y UDP 249 59395 → 8125 Len=207
942 48.770537 x.x.x.x → y.y.y.y UDP 257 59395 → 8125 Len=215
943 48.770692 x.x.x.x → y.y.y.y UDP 251 59395 → 8125 Len=209
944 48.770858 x.x.x.x → y.y.y.y UDP 248 59395 → 8125 Len=206
945 48.771013 x.x.x.x → y.y.y.y UDP 267 59395 → 8125 Len=225
946 48.771365 x.x.x.x → y.y.y.y UDP 257 59395 → 8125 Len=215
947 48.771559 x.x.x.x → y.y.y.y UDP 274 59395 → 8125 Len=232
948 48.771735 x.x.x.x → y.y.y.y UDP 272 59395 → 8125 Len=230
958 49.998482 x.x.x.x → y.y.y.y TCP 74 [TCP Retransmission] 41110 → 8126 [SYN] Seq=0 Win=65535 Len=0 MSS=1460 SACK_PERM=1 TSval=238284208 TSecr=0 WS=2048
The DEFAULT_AGENT_TIMEOUT looks to be 10 seconds, so I'm not sure why this is taking 30 seconds to complete, other than the fact that the TCP retransmissions are adding to the delay.
Can you please advise:
- Why the tracer still attempts to initiate a connection to an agent if the environment variable is set to disable it
- If we are setting our agent timeout incorrectly
Please let me know if you need any more information. Thank you!