sentry-python icon indicating copy to clipboard operation
sentry-python copied to clipboard

Celery integration captures retried exceptions that it shouldn't when one task directly calls another

Open lpsinger opened this issue 5 years ago • 7 comments

I have one Celery task that directly calls another, but as a function, not as a task. The first task is auto-retried for a certain set of exceptions, and the second task is retried for a different set of exceptions. Here is an example task in my actual codebase, but here is an isolated illustration of the problem.

Consider the following two tasks, bar which is auto-retried for exceptions A or B, and a wrapper task bat which is auto-retried for exceptions C or D.

def foo():
    ...  # do something that might raise exception A, B, C, or D

@app.task(autoretry_for=(A, B))
def bar():
    return foo()

@app.task(autoretry_for=(C, D))
def bat():
    return bar()

Now invoke bat:

bar.delay()

Now suppose that during the execution of bat, foo() raises exception C. Even though Celery will retry the task, the exception C will be reported to Sentry.

lpsinger avatar May 21 '19 01:05 lpsinger

We generally don't have support for autoretry_for, but we should fix that. If you raise Retry instead it should work fine

untitaker avatar May 21 '19 11:05 untitaker

Hello :wave: Is that documented that autoretry for Celery tasks is not supported? I just stumbled upon an issue with autoretried errors that appeared in Sentry and I'm not sure if this is a bug or expected behavior.

Best regards

pgrzesik avatar Mar 31 '20 10:03 pgrzesik

It's a bug, this bug :) I'll prioritize it now since you're the second person to have commented on this

untitaker avatar Mar 31 '20 10:03 untitaker

Wow, that was a fast response! Thank you @untitaker, that's not a big inconvenience for me personally, because it's not a big lift to rewrite task to not use autoretry_for, I just wanted to make sure that I'm not missing something on my part.

Thanks for all the work you guys are doing over here at Sentry :bowing_man:

pgrzesik avatar Mar 31 '20 10:03 pgrzesik

just commenting that I found the same issue here, though in our case, we have bind=True so, stealing from the top example

@app.task(bind=True, autoretry_for=(A, B))
def foo(self):
    ...  # do other things that can raise exception A, B

@app.task(bind=True, autoretry_for=(A, B))
def bar(self):
    return foo.run()

When we call bar.delay() if foo raises A, then the exception will still go to sentry even though it will be retried.

It's not a huge problem though, since I can rewrite this to use self.retry() instead of autoretry_for

olivertren avatar May 27 '20 21:05 olivertren

This issue has gone three weeks without activity. In another week, I will close it.

But! If you comment or otherwise update it, I will reset the clock, and if you label it Status: Backlog or Status: In Progress, I will leave it alone ... forever!


"A weed is but an unloved flower." ― Ella Wheeler Wilcox 🥀

github-actions[bot] avatar Dec 23 '21 15:12 github-actions[bot]

Quick update: This issue is still open. And because multiple persons are reporting this it will be kept in our minds for one of the next SDK updates.

antonpirker avatar Feb 22 '22 09:02 antonpirker

@sentrivana can we take another look at this issue?

smeubank avatar Aug 02 '23 16:08 smeubank

@sentrivana can we take another look at this issue?

Putting this in To Do so that it gets picked up soon.

sentrivana avatar Aug 03 '23 08:08 sentrivana

This issue is not specific to retrying exceptions. Rather, the Sentry SDK reports any exception raised by a function decorated with Celery's task decorator to Sentry as being an unhandled exception, even when the function is called as a function (not as a task) from within another Celery task which handles the exception. The following code sample illustrates this phenomenon:


from celery import Celery
import sentry_sdk

sentry_sdk.init(dsn="<dsn_here>")

app = Celery("tasks", broker_url="amqp://localhost:5672/")


@app.task()
def divide_by_zero():
    return 1 / 0


@app.task()
def try_dividing_by_zero():
    try:
        print("I wonder what 1 divided by zero is. Let's try it out...")
        divide_by_zero()
    except ZeroDivisionError:
        print(
            "Don't be silly! Division by zero is undefined and causes an error in Python."
        )
        print("Fortunately, I was here to handle the error!")


if __name__ == "__main__":
    try_dividing_by_zero.delay()

If you save the above snippet as tasks.py and then run the script, you will observe based on the output that the ZeroDivisionError gets handled by try_dividing_by_zero. However, if you log into your Sentry project, you will observe that the ZeroDivisionError was reported as an unhandled exception.

szokeasaurusrex avatar Aug 29 '23 14:08 szokeasaurusrex

We have decided to close this issue for now, since the fix would likely be quite complicated to implement and there are two relatively simple workarounds:

Workaround specific to autoretry_for

For those who are experiencing issues with auto-retried exceptions being reported to Sentry when the exception occurs in a Celery task that has been called as a function, you can work around the issue by raising Retry instead of using autoretry_for per @untitaker's comment:

We generally don't have support for autoretry_for, but we should fix that. If you raise Retry instead it should work fine

General workaround

Another option to prevent handled exceptions from being reported to Sentry as unhandled exception, which works regardless of whether the issue arises while using autoretry_for is to simply avoid calling Celery tasks as functions. Instead, create a normal function (i.e. not a Celery task) that does what you would like your function/Celery task to do, and then create a separate Celery task that simply calls your normal function.

For example, if you have the following Celery task that you also sometimes call as a normal function...

@app.task()
def divide_by_zero():
    1 / 0


# Elsewhere in the codebase, we call divide_by_zero as a Celery task.
# The exception will be reported to Sentry, as expected.
divide_by_zero.delay()  


# Somewhere else (within a Celery task), we call divide_by_zero, but we handle the exception.
# However, the exception is incorrectly reported to Sentry as an unhandled exception due to this issue. 
try:
    divide_by_zero() 
except ZeroDivisionError:
    pass

...you can replace it with the following to ensure to exceptions are incorrectly reported to Sentry because of this bug:

def divide_by_zero_normal_func():  # Normal function (not Celery task) with desired logic
    1 / 0

@app.task()
def divide_by_zero_celery_task():  # Celery task, which simply calls the normal function
    divide_by_zero_normal_func()


# Elsewhere in the codebase, we call divide_by_zero_celery_task as a Celery task.
# The exception will be reported to Sentry, as expected.
divide_by_zero_celery_task.delay()  


# Somewhere else (within a Celery task), we call divide_by_zero_as_func, but we handle the exception.
# Since divide_by_zero_as_func is a normal function, not a Celery task, no exception is reported to Sentry, as desired. 
try:
    divide_by_zero_normal_func() 
except ZeroDivisionError:
    pass

szokeasaurusrex avatar Sep 04 '23 13:09 szokeasaurusrex

Are you really closing an issue based on two recommended workarounds which do not work for simple scenarios? I mean, the first recommendation is literally a one-liner with no context. You raise Retry, how do you deal with retry limits? How do you do exponential backoff?... The second , I mean, I don't want to handle the exception and pass, I want to either retry it an not get a sentry error notification, or I exhausted retries and I want an error notification. How do you achieve that?

I get that this may be harder to implement than it looks like, and we're all busy people and have to manage priorities, but can we not sweep this under the rug? I can be sad and accept that it won't be fixed soon, but I think it'd be more honest to the community if this were kept open, it's still an unfixed bug.

HoneyryderChuck avatar Sep 08 '23 21:09 HoneyryderChuck

@HoneyryderChuck I think your feedback is spot on. For some context this got bumped because it was an example I mentioned in a comment I left on HN for @dcramer (https://news.ycombinator.com/item?id=36971490#36972263).

It's definitely disappointing that it just got closed after that, it would have been better just to leave it alone 🤦.

YPCrumble avatar Sep 14 '23 00:09 YPCrumble

Hey @HoneyryderChuck and @YPCrumble Thanks for your feedback. You are right, better keeping this open for other people to find it if the encounter the same problem.

For transparency, here are some notes(a tl;dr) from our internal discussions/investigations:

  • this is edge casey, if you use a task decorator, this is what happens
  • what daniel reported is a more general problem (value of fixing questionable)
  • we might be able to fix the autoretry behavior separately and neglect this problem
  • if we do want to fix it
    • we need to remove task.run/__call__ patches. patching __call__ is a can of worms because celery is very magical and we have to live with celery’s assumptions and magic
    • capture the exception slightly more above in celery/concurrency/* (i.e.e when a worker executes a certain task from the queue and not the actual task function itself)
    • for instance here - https://github.com/celery/celery/blob/b96ab282a8a2ea3d97d034f862e9fd6aceb0a0b5/celery/concurrency/base.py#L139-L155
    • this would need to be tested thoroughly, risk of breaking existing working stuff

antonpirker avatar Sep 15 '23 06:09 antonpirker

Closing because the clarifying comment kind of makes every one happy (and if not happy, at least not sad anymore.)

antonpirker avatar Nov 13 '23 08:11 antonpirker

Asking us to use the raise functionality removes the benefit and simplicity of autoretry_for in celery, which means that you don't need to explicitly catch the exceptions, like a RequestError in a celery job that makes an outbound request. This "workaround" means updating every single piece of application code wrapped in autoretry_for. I don't believe this is an acceptable reason to close it - if the answer is "it is too complex to fix" relative to our customers, please be explicit in that.

davidemerritt avatar Feb 05 '24 20:02 davidemerritt

+1 - this is a basic technique within celery that allows for generalized exception catching. Not supporting this means that we would have to make sentry specific modifications to application flow and code which is the opposite of what you're looking to do with exception tracking software.

davidemerritt avatar Mar 25 '24 15:03 davidemerritt

@davidemerritt To clarify, you can use autoretry_for with Sentry. All your exceptions will be reported to Sentry as expected 🙂

As I stated in my previous comment, this problem is, in fact, completely unrelated to autoretry_for. Rather, if you have a Celery task which raises an error, the error will get reported to Sentry even if the task was called as a function, and the exception got handled outside the task. This is an edge case that likely affects few users, and it can be easily worked around.

szokeasaurusrex avatar May 06 '24 17:05 szokeasaurusrex