sentry-ruby icon indicating copy to clipboard operation
sentry-ruby copied to clipboard

sentry-sidekiq reports exceptions for jobs with `dead: false` when `report_after_job_retries` is true

Open eapache-opslevel opened this issue 1 year ago • 2 comments

Issue Description

We have set up sentry-sidekiq with report_after_job_retries = true.

We have a job which we want to retry, but where we don't care if it ultimately fails, so we have configured that job with sidekiq_options dead: false to prevent it from going to the sidekiq dead set.

Exceptions from this job do not hit sentry while it is retrying (good), but we do still get a sentry exception for a dead job when it exhausts its retries (bad), even though the job never actually hits the dead set.

Arguably report_after_job_retries is behaving as intended, since the job has exhausted its retries in this case; I can see the case that this is really a feature request for a new flag (only_report_dead_jobs or something). But IMO the fact that the exception comes in to Sentry with SidekiqJob.dead = True is a bug, because the job is very definitely not hitting the dead set.

Reproduction Steps

  • configure sentry-sidekiq with report_after_job_retries = true
  • create a sidekiq job with dead: false that raises an exception
  • execute that job, let it exhaust its retries
  • see the sentry exception about a dead job

Expected Behavior

As discussed in the issue description, there are a couple of possible behaviours here that could make sense, but I don't think the current one does.

Actual Behavior

sentry-sidekiq sends dead-job exceptions for jobs that skip the dead set

Ruby Version

3.2.5

SDK Version

5.18.1

Integration and Its Version

Sidekiq 7.3.0

Sentry Config

No response

eapache-opslevel avatar Aug 09 '24 16:08 eapache-opslevel

@eapache-opslevel will take a look at this when I have time. Do you have a suggested fix since you've already given it some thought?

sl0thentr0py avatar Aug 12 '24 11:08 sl0thentr0py

I have a couple of thoughts here.

The simplest change would be to add another condition to the code around https://github.com/getsentry/sentry-ruby/blob/8c1c259f7d288052e90dea15e3aa8b48c5c21c20/sentry-sidekiq/lib/sentry/sidekiq/error_handler.rb#L22-L30 such that no exception is reported when report_after_job_retries is true and the job would skip the dead set, even if it has exhausted its retries. That's the behaviour I had intuitively expected, but it is also a potentially dangerous behavioural change for anybody relying on the current behaviour.

Safer would be to add another config option called something like report_jobs_that_skip_dead_set (defaults to true to maintain current behaviour), where setting it to false would suppress errors that come in from jobs configured with dead: false. Though how this interacts with report_after_job_retries is a potentially interesting question.

Not sure which of these seems more correct to you, or whether there's a third option that gives the best of both worlds.

eapache-opslevel avatar Aug 12 '24 14:08 eapache-opslevel

To keep the semantics simple, I've added a new boolean report_only_dead_jobs that will ignore jobs with dead: false completely.

sl0thentr0py avatar Mar 13 '25 16:03 sl0thentr0py