dd-trace-rb
dd-trace-rb copied to clipboard
Failed to open TCP connection after upgrading to v0.54.2
After upgrading to ddtrace v0.54.2, we started receiving these errors during app shutdown:
E, [2022-01-21T18:52:02.250127 #4] ERROR -- ddtrace: [ddtrace] (/app/vendor/bundle/ruby/3.0.0/gems/ddtrace-0.54.2/lib/ddtrace/transport/http/client.rb:37:in `rescue in send_request') Internal error during HTTP transport request. Cause: Failed to open TCP connection to 127.0.0.1:8126 (Connection refused - connect(2) for "127.0.0.1" port 8126) Location: /app/vendor/ruby-3.0.3/lib/ruby/3.0.0/net/http.rb:987:in `initialize'
E, [2022-01-21T18:52:04.189624 #4] ERROR -- ddtrace: [ddtrace] (/app/vendor/bundle/ruby/3.0.0/gems/ddtrace-0.54.2/lib/ddtrace/transport/http/client.rb:37:in `rescue in send_request') Internal error during HTTP transport request. Cause: Failed to open TCP connection to 127.0.0.1:8126 (Connection refused - connect(2) for "127.0.0.1" port 8126) Location: /app/vendor/ruby-3.0.3/lib/ruby/3.0.0/net/http.rb:987:in `initialize'
We upgraded ddtrace on 1/21, and starting with the first restart (which does a shutdown of our Heroku dynos before booting up new ones) after that, this error started showing up in our logs. It always seems to occur twice per shutdown. The version of our agent did not change from 7.32.4-1.
I wish I had more to go on here, but hopefully someone has a hunch about what might be going on here! Thanks so much.
Hey @agrobbin, do you know what version of ddtrace you've upgraded from, so we can narrow our search domain?
@marcotc yep! We went from 0.54.1 to 0.54.2, which also included the following gem updates:
- debase-ruby_core_source 0.10.12 -> 0.10.14
- msgpack 1.4.2 -> 1.4.3
Thanks for the info @agrobbin.
The changes between 0.54.1 and 0.54.2 only really affect profiling (https://my.diffend.io/gems/ddtrace/0.54.1/0.54.2).
The log you shared indicates an issue reporting tracing data (lib/ddtrace/transport/http/client.rb:37
) which... is a bit unexpected 🤔 .
Can you give us the following information so we can investigate further:
- Is profiling enabled for the application where you saw this issue?
- Could you share your configuration (
Datadog.configure
/DD_
environment variables)? - Did you end up staying on 0.54.2, or did you revert back to 0.54.1? If you reverted back, did the message disappear?
Happy to share some more details! We do not have profiling enabled (we trialed it at some point in the past, months ago, but it is not currently enabled). Here is our configuration:
Datadog.configure do |c|
c.diagnostics.startup_logs.enabled = false
c.runtime_metrics.enabled = true # Private Beta
c.sampling.default_rate = 1.0
if Rails.configuration.x.datadog_enabled
c.use :aws, service_name: '[filtered]-aws'
c.use :faraday, split_by_domain: true
c.use :httpclient, split_by_domain: true
c.use :http, split_by_domain: true
c.use :rails, log_injection: true, service_name: '[filtered]', job_service: '[filtered]-worker'
c.use :redis, service_name: '[filtered]-redis'
c.use :rest_client, split_by_domain: true
c.use :sidekiq, service_name: '[filtered]-worker', client_service_name: '[filtered]-worker'
else
c.tracer.enabled = false
end
end
config.x.datadog_enabled
is implemented as follows:
config.x.datadog_enabled = ENV['DISABLE_DATADOG_AGENT'].blank? && ENV['DD_API_KEY'].present?
And here are our DD_*
env variables:
DD_AGENT_MAJOR_VERSION=7
DD_API_KEY=[filtered]
DD_APM_NON_LOCAL_TRAFFIC=true
DD_DYNO_HOST=true
DD_ENV=production
DD_LOG_LEVEL=error
DD_PROCESS_AGENT=true
DD_RUM_APPLICATION_ID=[filtered]
DD_RUM_CLIENT_TOKEN=[filtered]
The DD_RUM_*
env variables are our own convention, and I don't believe are looked at by dd-trace, just in case there was any confusion there! We are running in Heroku, on the heroku-20
stack, with the latest DataDog buildpack (v7.33.0-1
).
I have not reverted us back, but I can do that in our staging environment (all the same config, except DD_ENV
of course) and see if the error disappears.
I was able to try this in our staging environment, and it does seem to be happening on v0.54.1 as well. It's definitely weird that we didn't see this output previously, but it's definitely possible I simply mistyped something when doing my log searches previously!
E, [2022-02-16T17:55:16.539285 #4] ERROR -- ddtrace: [ddtrace] (/app/vendor/bundle/ruby/3.1.0/gems/ddtrace-0.54.1/lib/ddtrace/transport/http/client.rb:37:in `rescue in send_request') Internal error during HTTP transport request. Cause: Failed to open TCP connection to 127.0.0.1:8126 (Connection refused - connect(2) for 127.0.0.1:8126) Location: /app/vendor/ruby-3.1.0/lib/ruby/3.1.0/socket.rb:1214:in `__connect_nonblock'
Cool, thanks for experimenting with 0.54.1 and confirming that it happens there as well.
Since you mentioned these logs show up on app shutdown, I wonder if the issue is related to some kind of race, where heroku signals both the datadog agent as well as the app to shut down, but the agent cleans up and finishes faster. Hmmm....
I'll need to loop in someone else to help with this investigation. Let me get back to you on next steps :)
Hey @agrobbin sorry for the delay getting back to you.
Would you mind reaching out to support via https://www.datadoghq.com/support/ and including a link to this ticket?
This way support can investigate the ticket in the context of your account and specific agent setup.
@ivoanjo done!
hello @agrobbin, any news on how to fix that issue?
I receive the same error on my Heroku review apps, even when DISABLE_DATADOG_AGENT=true
@akapronik hopefully @agrobbin can chime in with his learnings, but I do still recommend pinging support via https://www.datadoghq.com/support/ , as failures to contact the agent may be caused by a number of different causes.
@akapronik @ivoanjo unfortunately I don't have any news here. I raised the issue with Datadog Support, and the last update I received was that Engineering is looking into the issue. Sorry to not be of more assistance!
I am also receiving this error on Cloud Foundry (which is similar to Heroku) on 0.54.2.
gem 'ddtrace', require: 'ddtrace/auto_instrument'
gem 'dogstatsd-ruby'
DD_TRACE_ENABLED: true
DD_RUNTIME_METRICS_ENABLED: true
Note I don't use an initialiser. The error occurs during deployment when the application is starting.
WARN -- ddtrace: [ddtrace] DATADOG TRACER DIAGNOSTIC - Agent Error: Datadog::Transport::InternalErrorResponse ok?: unsupported?:, not_found?:, client_error?:, server_error?:, internal_error?:true, payload:, error_type:Errno::ECONNREFUSED error:Failed to open TCP connection to 127.0.0.1:8126 (Connection refused - connect(2) for "127.0.0.1" port 8126)
-- | --
ERROR -- ddtrace: [ddtrace] (/home/vcap/deps/1/vendor_bundle/ruby/2.7.0/gems/ddtrace-0.54.2/lib/ddtrace/transport/http/client.rb:37:in `rescue in send_request') Internal error during HTTP transport request. Cause: Failed to open TCP connection to 127.0.0.1:8126 (Connection refused - connect(2) for "127.0.0.1" port 8126) Location: /home/vcap/deps/1/ruby/lib/ruby/2.7.0/net/http.rb:960:in `initialize'
Are there any updates to this issue?
@sterankin you mention "during deployment"; I wonder if your situation is similar to the Heroku issue where during push/buildpack build a few errors can show up in the log (see https://docs.datadoghq.com/agent/guide/heroku-ruby/#traces )...?
If you see the issue after deployment, when the service is running normally, then that's not expected at all, and I would also ask you to file a ticket with support so that we can look into specifics of your setup.
Unfortunately the "Failed to open TCP connection" error itself can be caused by many things:
- The Datadog agent is not running at all
- There is some misconfiguration preventing dd-trace-rb from reaching the Datadog agent
- There are networking issues
- Bug in dd-trace-rb, although the agent reporting code is something we don't touch often and hasn't been changed recently
- Bug in the Datadog agent
- etc...
This is why I advise filing a ticket so that we can contextually look into the root cause in each situation, which can be quite different 😄
@sterankin you mention "during deployment"; I wonder if your situation is similar to the Heroku issue where during push/buildpack build a few errors can show up in the log (see https://docs.datadoghq.com/agent/guide/heroku-ruby/#traces )...?
If you see the issue after deployment, when the service is running normally, then that's not expected at all, and I would also ask you to file a ticket with support so that we can look into specifics of your setup.
Unfortunately the "Failed to open TCP connection" error itself can be caused by many things:
- The Datadog agent is not running at all
- There is some misconfiguration preventing dd-trace-rb from reaching the Datadog agent
- There are networking issues
- Bug in dd-trace-rb, although the agent reporting code is something we don't touch often and hasn't been changed recently
- Bug in the Datadog agent
- etc...
This is why I advise filing a ticket so that we can contextually look into the root cause in each situation, which can be quite different 😄
This could be correct, I will open a ticket but it could be a different issue than the op 👍
FYI, I added these variables to Heroku production environments, and the issue has gone.
DD_TRACE_ENABLED=true
DD_APM_ENABLED=true
for staging, I added this, and it also fixes the issue
DD_TRACE_ENABLED=false
It works with the latest ddtrace
version 0.54.2
We are receiving a similar error with our ddtrace on our Rails app on Heroku, but as far as I know its limited to the console. When connect to our app's Rails console via Heroku CLI, the following message is generated about once every minute:
ERROR -- ddtrace: [ddtrace] (/app/vendor/bundle/ruby/2.7.0/gems/ddtrace-1.15.0/lib/datadog/profiling/http_transport.rb:60:in `export') Failed to report profiling data ({:agent=>"http://127.0.0.1:8126/"}): failed ddog_prof_Exporter_send: error trying to connect: tcp connect error: Connection refused (os error 111): tcp connect error: Connection refused (os error 111): Connection refused (os error 111)
Unfortunately the DevOps that set this up is no longer with the company, so I'm doing my best to resolve this annoyance while learning the basics of Datadog. We are using v1.15 of the ddtrace
gem.
I tried @akapronik 's suggestion of disabling the DD_TRACE_ENABLED
env var but that does not appear to work.
As much as I can tell, the profiler is working - I see data when I view the Profiling page in Datadog.
Hey @johnpitchko :wave:
I suspect what's happening is that when running the heroku CLI, the Datadog agent doesn't get started (which seems reasonable -- you may not want to get data from the console).
To disable the profiler just for that environment, you can use the DD_PROFILING_ENABLED
environment variable; the variable you mentioned only controls distributed tracing -- although this is a bit of a historical naming confusion.
Let me know if this works for you!
Hey @johnpitchko, hopefully my suggestion was helpful and no news is good news! Let us know if we can help any further :)
Since this ticket was originally about a now-old ddtrace version (0.54.2), I'm going to go ahead and close it. Please do open a new issue if you see this again / run into any other issue :)
@ivoanjo I was successful in resolving this issue a few weeks ago - apologies for not responding sooner and thank you for your input!
Nice, glad to know! :)