sentry-ruby icon indicating copy to clipboard operation
sentry-ruby copied to clipboard

`undefined method 'send_default_pii' for nil`

Open crespire opened this issue 1 month ago • 33 comments

Issue Description

On server startup, when Sentry configuration may not be fully initialized, it seems that there are spots where Sentry.configuration may be nil.

This is very similar to other previous issues I've encountered around Sentry.configuration initialization: https://github.com/getsentry/sentry-ruby/issues/2782 https://github.com/getsentry/sentry-ruby/issues/2386

We should generally guard against non-initialized Sentry.configuration calls by ensuring that Sentry is initialized.

Currently, the method just exits early https://github.com/getsentry/sentry-ruby/blob/e219832b57c5aa32e41be4462dbe421c8c58bc67/sentry-ruby/lib/sentry-ruby.rb#L164C1-L167C8

Could it be changed such that if we are not initialized, we actually initialize Sentry? I feel this would prevent a whole class of NoMethod no nil errors I've personally run into.

Reproduction Steps

During app boot (after a new deploy for example), we occasionally run into this type of error.

Expected Behavior

Sentry does not cause a 500.

Actual Behavior

Sentry causes a 500 with no backtrace.

Ruby Version

3.4.4

SDK Version

6.1.1

Integration and Its Version

Rails 8.0.2

Sentry Config

Default

crespire avatar Nov 27 '25 21:11 crespire

RUBY-120

linear[bot] avatar Nov 27 '25 21:11 linear[bot]

I agree that this issue has gotten out of hand but to fix this is not so trivial and requires some core refactoring.

Could it be changed such that if we are not initialized, we actually initialize Sentry?

No, we cannot do that because that's not how the SDK design is supposed to be, that would defeat the purpose of having an initialize at all. But we could have a fallback configuration always, just that it could potentially cause weird behavior if not done properly.

Sentry is supposed to be initialized extremely early in your app startup process, could you try moving it (prefixing the initializer with 01_, for instance) so that it runs first?

Alternatively, can you tell me which particular check fails in your case so I can fix it?

sl0thentr0py avatar Dec 01 '25 13:12 sl0thentr0py

I can try to manipulate the loading order, let me see and I'll report back.

crespire avatar Dec 02 '25 12:12 crespire

I was chatting with Nate Matykiewicz, and he thought through why this might be happening.

  1. Puma receives request
  2. Signal is sent to shut down puma (stop taking new requests, gracefully terminate within 30 seconds)
  3. at_exit fires causing Sentry to shut down
  4. Puma is still processing that final request
  5. 500 errors because the at_exit hook nil'd out @main_hub, causing Sentry.configuration to be nil
  6. Puma shuts down
  7. Puma boots
  8. Sentry.init is called
  9. Sentry.configuration is not nil now

The theory is that there is some race condition on the shutdown sequence that is causing this issue, so maybe moving the initializer up may not solve the problem.

In anycase, we've made and deployed the initializer order change on our app, so will keep a pulse on this, but it would be interesting to explore the shutdown sequencing to see if that might actually be the root cause of the problem.

crespire avatar Dec 08 '25 16:12 crespire

The initialization fix has been out for about 2 hours, and we are still getting this error, so it seems that it may be a race condition on exit.

crespire avatar Dec 08 '25 20:12 crespire

Also getting sdk_logger method doesn't exist now, also on configuration. We've reverted the initializer order change, as pushing it up seemed to cause more errors than it solved.

crespire avatar Dec 08 '25 20:12 crespire

I've added some custom middleware as well as some pre and post hook logging to see if I can get some more context about what's happening.

crespire avatar Dec 09 '25 14:12 crespire

thanks @crespire if this is mainly at shutdown, we could simply detect Puma somehow and make the close a bit safer. I will try to reproduce. However if it as boot time as in the original report, that points to a different problem.

sl0thentr0py avatar Dec 09 '25 14:12 sl0thentr0py

can you give me the main parts of your puma config?

  • do you use preload_app!?
  • what are your workers and threads set to?
  • what signal do you use to restart and shutdown your puma workers?

sl0thentr0py avatar Dec 09 '25 14:12 sl0thentr0py

Just to be clear, I don’t have any evidence for my theory. I was just reading code trying to understand how @main_hub would be nil, and it seems like the answer is:

  1. Before Sentry.init
  2. After Sentry.close

Puma should not be accepting requests before the Rails app has been fully initialized, which seems to rule out option 1, which left me wondering how option 2 could occur.

natematykiewicz avatar Dec 09 '25 14:12 natematykiewicz

I agree that 1 is unlikely too

sl0thentr0py avatar Dec 09 '25 15:12 sl0thentr0py

can you give me the main parts of your puma config?

* do you use `preload_app!`?

* what are your `workers` and `threads` set to?

* what signal do you use to restart and shutdown your puma workers?

Yes, we use preload_app! in production. They're set to 4, and 5, respectively. The signal we use is SIGTERM as that is the standard signal CloudRun uses.

crespire avatar Dec 09 '25 17:12 crespire

@sl0thentr0py I appreciate you looking into this issue!

Is there any way to escalate this? We are seeing this in production and it is affecting live clients. Our next step will be disabling Sentry entirely and moving to another APM is this continues for much longer

multiplegeorges avatar Dec 11 '25 15:12 multiplegeorges

  • do you all belong to the same org or is this a larger issue?
  • An actual stacktrace of the failing line would be very useful, otherwise we are talking about a larger refactor (which is what the issue is about) which is not planned at the moment
  • as a stop gap, I will make some changes with flushing/closing which will also mitigate this (this will be released earliest next week)

sl0thentr0py avatar Dec 11 '25 15:12 sl0thentr0py

We did also recently update the Sentry gem from a version pre-metrics depreciation, and the error started happening then. Let me see if i can get the exact gem versions we moved to and from.

crespire avatar Dec 11 '25 15:12 crespire

more info is always appreciated so I can repro

  • relevant versions
  • initializer config
  • which gems (sentry-ruby/rails/sidekiq etc)

sl0thentr0py avatar Dec 11 '25 15:12 sl0thentr0py

This probably isn’t all that helpful but I don’t believe I’m having this issue. But I also am fairly behind on upgrades.

    sentry-rails (5.23.0)
      railties (>= 5.0)
      sentry-ruby (~> 5.23.0)
    sentry-ruby (5.23.0)
      bigdecimal
      concurrent-ruby (~> 1.0, >= 1.0.2)
    sentry-sidekiq (5.23.0)
      sentry-ruby (~> 5.23.0)
      sidekiq (>= 3.0)

natematykiewicz avatar Dec 11 '25 15:12 natematykiewicz

Oh and to answer your question, I’m not in an org with either of these other guys. I know them, but we don’t work together.

natematykiewicz avatar Dec 11 '25 15:12 natematykiewicz

@multiplegeorges and I are in the same org.

Previous:

    sentry-rails (5.24.0)
      railties (>= 5.0)
      sentry-ruby (~> 5.24.0)
    sentry-ruby (5.24.0)
      bigdecimal
      concurrent-ruby (~> 1.0, >= 1.0.2)

Current:

sentry-rails (6.1.1)
      railties (>= 5.2.0)
      sentry-ruby (~> 6.1.1)
    sentry-ruby (6.1.1)
      bigdecimal
      concurrent-ruby (~> 1.0, >= 1.0.2)

The new middleware I added did add some backtrace context, but it simply points to the call mwthod on the new middleware, so not helpful. We still see the pattern where we are rendering the error via Rails, but I can't find the request ID in any of our logging.

crespire avatar Dec 11 '25 15:12 crespire

Where are you guys seeing these error details? Is it being reported to your Sentry account, or is it only in log files?

natematykiewicz avatar Dec 11 '25 15:12 natematykiewicz

Where are you guys seeing these error details? Is it being reported to your Sentry account, or is it only in log files?

It's actually being rendered by a Rails ErrorsController that we've setup.

crespire avatar Dec 11 '25 15:12 crespire

So there’s no mention of it in your Sentry account right? (I’d assume not, since it can’t find the configuration.)

natematykiewicz avatar Dec 11 '25 15:12 natematykiewicz

So there’s no mention of it in your Sentry account right? (I’d assume not, since it can’t find the configuration.)

Correct. I'm also not seeing the request ID in GCP logging, which is strange. We may downgrade the sentry gem to the last 5.x version to see if this improves things.

crespire avatar Dec 11 '25 15:12 crespire

Okay, so I've got some updates as well.

As part of the middleware I added, I also added companion initializers around the Sentry initializer using the same at_exit hook.

I also added a hook to log on Puma's on_worker_shutdown so I could get some context information.

I notice today in our logging, that we are getting a lot of logging in the at_exit surrounding the Sentry initializer, but I don't see any of the same on_worker_shutdown logs from Puma.

Here are a sample pair of at_exit hooks: Early hook:

{
  "jsonPayload": {
    "exit_status": "Redis::CannotConnectError",
    "message": "[Sentry Diagnostics] at_exit hook (late registered, runs early)",
    "sentry_state": {
      "configuration_present": true,
      "initialized": true,
      "background_worker_present": true,
      "background_worker_thread_status": null,
      "main_hub_present": true,
      "background_worker_thread_alive": null
    },
    "pid": 21554,
    "timestamp": "2025-12-11T16:24:20Z",
    "thread_count": 20
  },
  // other GCP meta info
}

Late hook:

{
  "jsonPayload": {
    "pid": 21554,
    "exit_status": "Redis::CannotConnectError",
    "sentry_state": {
      "background_worker_thread_status": null,
      "initialized": true,
      "background_worker_thread_alive": null,
      "main_hub_present": true,
      "configuration_present": true,
      "background_worker_present": true
    },
    "message": "[Sentry Diagnostics] at_exit hook (early registered, runs late)",
    "thread_count": 20,
    "timestamp": "2025-12-11T16:24:20Z"
  },
  // Other GCP meta info
}

And here are the files in question:

# config/initalizers/sentry_diagnostics.rb

# typed: false
# frozen_string_literal: true

# Sentry Diagnostics
#
# This initializer adds logging to help diagnose the intermittent
# "undefined method for nil:NilClass" errors with Sentry.configuration.
#
# See: https://github.com/getsentry/sentry-ruby/issues/2800
#
# We register an at_exit hook EARLY (before Sentry's own hook) so it runs LATE
# during shutdown. This helps us understand the order of teardown.

if Rails.env.production?
  # This runs LATE during shutdown (at_exit hooks run in reverse order of registration)
  at_exit do
    sentry_state = begin
      worker = Sentry.background_worker
      worker_thread = worker&.instance_variable_get(:@thread)
      {
        initialized: Sentry.initialized?,
        main_hub_present: !Sentry.instance_variable_get(:@main_hub).nil?,
        configuration_present: !Sentry.configuration.nil?,
        background_worker_present: !worker.nil?,
        background_worker_thread_alive: worker_thread&.alive?,
        background_worker_thread_status: worker_thread&.status
      }
    rescue => e
      {error: e.class.name, message: e.message}
    end

    log_entry = {
      severity: "INFO",
      message: "[Sentry Diagnostics] at_exit hook (early registered, runs late)",
      timestamp: Time.now.utc.iso8601,
      pid: Process.pid,
      exit_status: $!&.class&.name || "clean",
      sentry_state:,
      thread_count: Thread.list.count
    }

    $stdout.puts log_entry.to_json
    $stdout.flush
  rescue => e
    # Don't let diagnostics crash the shutdown
    warn "[Sentry Diagnostics] at_exit logging failed: #{e.message}"
  end
end

And the late hook

# config/initializers/zzz_sentry_diagnostics_late.rb

# typed: false
# frozen_string_literal: true

# Sentry Diagnostics - Late Registration
#
# This file is named with 'zzz_' prefix to ensure it loads AFTER other initializers,
# including Sentry's own initializer. The at_exit hook registered here will run
# EARLY during shutdown (before Sentry's cleanup).
#
# By comparing logs from both hooks, we can determine if Sentry's state changes
# during the shutdown sequence.
#
# See: https://github.com/getsentry/sentry-ruby/issues/2800

if Rails.env.production?
  # This runs EARLY during shutdown (registered late, so runs first in reverse order)
  at_exit do
    sentry_state = begin
      worker = Sentry.background_worker
      worker_thread = worker&.instance_variable_get(:@thread)
      {
        initialized: Sentry.initialized?,
        main_hub_present: !Sentry.instance_variable_get(:@main_hub).nil?,
        configuration_present: !Sentry.configuration.nil?,
        background_worker_present: !worker.nil?,
        background_worker_thread_alive: worker_thread&.alive?,
        background_worker_thread_status: worker_thread&.status
      }
    rescue => e
      {error: e.class.name, message: e.message}
    end

    log_entry = {
      severity: "INFO",
      message: "[Sentry Diagnostics] at_exit hook (late registered, runs early)",
      timestamp: Time.now.utc.iso8601,
      pid: Process.pid,
      exit_status: $!&.class&.name || "clean",
      sentry_state:,
      thread_count: Thread.list.count
    }

    $stdout.puts log_entry.to_json
    $stdout.flush
  rescue => e
    # Don't let diagnostics crash the shutdown
    warn "[Sentry Diagnostics] at_exit logging failed: #{e.message}"
  end

  # Also log that we've successfully registered the hook
  Rails.logger.info "[Sentry Diagnostics] Late at_exit hook registered. Sentry.initialized?=#{Sentry.initialized?}"
end

There doesn't seem to be a state change. This particular pair has a Redis error, but I also see pairs with clean as the exit_status that exhibit the same tight timing, so I think the redis error is a red herring.

That being said, I don't see any Puma related worker shutdown logging, so I'm wondering why we see such a volume of at_exit logging? Maybe I am misunderstanding at_exit usage here? But if so, I know that the Sentry gem is also using this method to call close, so maybe that's the problem?

crespire avatar Dec 11 '25 16:12 crespire

I know that the Sentry gem is also using this method to call close, so maybe that's the problem?

yes, this is exactly the problem, but I need to investigate a bit to see what the best solution is from:

  • go back to just flushing and not close on exit
  • doing something conditional at exit just for puma (if I can reproduce)

last question: are you using config.enable_logs = true? In general, your whole Sentry.init block would be very helpful since you haven't posted it here yet (you can leave out sensitive stuff, if any).

sl0thentr0py avatar Dec 11 '25 17:12 sl0thentr0py

Here is our whole file with sensitive info removed:

# typed: false

Sentry.init do |config|
  config.dsn = ""
  config.breadcrumbs_logger = [:active_support_logger, :http_logger]

  # Set the environment
  config.environment = Rails.env
  config.enabled_environments = %w[production]

  # Release tracking
  config.release = ENV["HEROKU_SLUG_COMMIT"] || ENV["GIT_COMMIT"] || `git rev-parse HEAD`.chomp

  # Configure which exceptions should not be reported
  config.excluded_exceptions += [
    "ActionController::RoutingError",
    "ActiveRecord::RecordNotFound",
    "Sidekiq::SoftRetry"
  ]

  # Filter sensitive data
  config.before_send = lambda do |event, hint|
    # Filter out sensitive parameters
    event.request&.data&.delete(:password)
    event.request&.data&.delete(:password_confirmation)
    event
  end

  # Set traces_sample_rate to 1.0 to capture 100%
  # of transactions for tracing.
  # We recommend adjusting this value in production.
  config.traces_sample_rate = 0.2

  # Set profiles_sample_rate to profile 100%
  # of sampled transactions.
  # We recommend adjusting this value in production.
  # eg. traces_sample_rate = 0.2, profiles_sample_rate = 1.0 means 20% of all transactions are profiled
  # eg. traces_sample_rate = 0.2, profiles_sample_rate = 0.5 means 10% of all transactions are profiled
  config.profiles_sample_rate = 1.0
end

crespire avatar Dec 11 '25 17:12 crespire

I am still getting reports that this is happening after reverting to 5.28.1 - I may try to revert back to the version we were using before, which was 5.24.0.

On 5.28.1, the error is undefined method 'rails' for nil, which is different than the send_default_pii method we were receiving on the 6.1.1 version.

crespire avatar Dec 11 '25 19:12 crespire

What if you do a git bisect sort of approach and go to 5.26.0?

natematykiewicz avatar Dec 11 '25 19:12 natematykiewicz

My plan is to see if 5.24.0 calms things down, then do minor version bumps from there until the problem resurfaces. If 5.24.0 also has issues, then I'll be really baffled, as we were on that version of the gem for so long without issues.

crespire avatar Dec 11 '25 19:12 crespire

Some updates. We are still encountering these errors on 5.24.0 but now, the error is back to the rails method. I'm not sure if that's helpful.

Searching generally in our logging for Sentry, I am noticing that we have a few background worker failures:

{
  "jsonPayload": {
    "level": "error",
    "line": 14,
    "timestamp": "2025-12-12T13:52:45.455212Z",
    "thread": "worker-51",
    "name": "Rails",
    "level_index": 4,
    "file": "/usr/local/bundle/gems/sentry-ruby-5.24.0/lib/sentry/utils/logging_helper.rb",
    "host": "sidekiq-worker-6bf668667d-pjtps",
    "environment": "production",
    "application": "Semantic Logger",
    "message": "sentry -- exception happened in background worker: Failed to open TCP connection to <URL trimmed just in case> (execution expired)",
    "pid": 1
  },
  "timestamp": "2025-12-12T13:52:45.455500820Z"
}

I am not currently sure if this is correlated (timing wise) with the errors we're getting in userland, but I thought I'd put it here in case it's helpful.

Because this was our "known good" version and the error is still happening, I'm going to look at other changes made in the same commit for other clues.

I also found some weird orphan logging but it looks like part of a stack trace terminating in my custom middleware:

2025-12-12 09:14:29.010
  /usr/local/bundle/gems/sentry-rails-5.24.0/lib/sentry/rails/controller_transaction.rb:38:in 'Sentry::Rails::ControllerTransaction#sentry_around_action'
2025-12-12 09:14:29.010
  /usr/local/bundle/gems/sentry-rails-5.24.0/lib/sentry/rails/tracing.rb:56:in 'Sentry::Rails::Tracing::SentryNotificationExtension#instrument'
2025-12-12 09:14:29.010
  /app/lib/middleware/sentry_guard.rb:38:in 'SentryGuard#call'

I'm going to investigate this as well, but an issue here does make sense as well. The timing on this logging matches when a user experienced the error this morning, so it seems promising.

crespire avatar Dec 12 '25 14:12 crespire