sentry-ruby icon indicating copy to clipboard operation
sentry-ruby copied to clipboard

SystemStackError when serializing breadcrumbs. Potentially causing serious issues in Sidekiq when uses in conjunction with background workers

Open drj17 opened this issue 1 year ago • 0 comments

Issue Description

Discord thread

Continuing this report from a discord thread I created a few days ago. We have been investigating our Sidekiq workers becoming unresponsive and think we traced the issue to a combination of two things.

First, a SystemStackError when serializing the breadcrumbs - here's some relevant logging

17653784398360371212024-08-28 09:10:412024-08-28 09:10:4113994936442blaze-ai-rails34.225.253.144Local7Infoapp/almanacworker.3/app/vendor/bundle/ruby/3.1.0/gems/activesupport-7.0.5/lib/active_support/core_ext/object/instance_variables.rb:15: warning: Exception in finalizer #<Proc:0x00007fe503ce5a28 (lambda)>
17653784398360371222024-08-28 09:10:412024-08-28 09:10:4113994936442blaze-ai-rails34.225.253.144Local7Infoapp/almanacworker.3/app/vendor/bundle/ruby/3.1.0/gems/activesupport-7.0.5/lib/active_support/core_ext/object/instance_variables.rb:15:in `[]': SystemStackError
17653784398360371232024-08-28 09:10:412024-08-28 09:10:4113994936442blaze-ai-rails34.225.253.144Local7Infoapp/almanacworker.3\tfrom /app/vendor/bundle/ruby/3.1.0/gems/activesupport-7.0.5/lib/active_support/core_ext/object/instance_variables.rb:15:in `instance_values'
17653784398360371242024-08-28 09:10:412024-08-28 09:10:4113994936442blaze-ai-rails34.225.253.144Local7Infoapp/almanacworker.3\tfrom /app/vendor/bundle/ruby/3.1.0/gems/activesupport-7.0.5/lib/active_support/core_ext/object/json.rb:63:in `as_json'
17653784398360371252024-08-28 09:10:412024-08-28 09:10:4113994936442blaze-ai-rails34.225.253.144Local7Infoapp/almanacworker.3\tfrom /app/vendor/bundle/ruby/3.1.0/gems/activesupport-7.0.5/lib/active_support/core_ext/object/json.rb:180:in `block in as_json'
17653784398360371262024-08-28 09:10:412024-08-28 09:10:4113994936442blaze-ai-rails34.225.253.144Local7Infoapp/almanacworker.3\tfrom /app/vendor/bundle/ruby/3.1.0/gems/activesupport-7.0.5/lib/active_support/core_ext/object/json.rb:179:in `each'

I believe https://github.com/getsentry/sentry-ruby/issues/2393 was created to address this.

A more insidious issue followed - it seemed that when this error occurred the background thread got stuck. Eventually our sidekiq instances stopped responding altogether. We think this was fixed by disabling background workers on Sidekiq:

  config.background_worker_threads = 0 if Sidekiq.server?

After setting this we've no longer seen crashes, but it's also possible that it just lowered the rate of issues enough that the daily cycling of sidekiq prevented it from ever getting to the point of killing a dyno entirely.

Reproduction Steps

Unsure - potentially trying serialize a very large object.

Expected Behavior

  1. SystemStackErrors don't occur at all
  2. Failures in the background worker are handled gracefully

Actual Behavior

  1. SystemStackError occurs while serializing a breadcrumb
  2. Sidekiq process become unresponsive

Ruby Version

3.1.3

SDK Version

5.9.0

Integration and Its Version

No response

Sentry Config

No response

drj17 avatar Sep 06 '24 16:09 drj17