sentry-python
sentry-python copied to clipboard
Celery integration captures retried exceptions that it shouldn't when one task directly calls another
I have one Celery task that directly calls another, but as a function, not as a task. The first task is auto-retried for a certain set of exceptions, and the second task is retried for a different set of exceptions. Here is an example task in my actual codebase, but here is an isolated illustration of the problem.
Consider the following two tasks, bar
which is auto-retried for exceptions A
or B
, and a wrapper task bat
which is auto-retried for exceptions C
or D
.
def foo():
... # do something that might raise exception A, B, C, or D
@app.task(autoretry_for=(A, B))
def bar():
return foo()
@app.task(autoretry_for=(C, D))
def bat():
return bar()
Now invoke bat
:
bar.delay()
Now suppose that during the execution of bat
, foo()
raises exception C
. Even though Celery will retry the task, the exception C
will be reported to Sentry.
We generally don't have support for autoretry_for
, but we should fix that. If you raise Retry
instead it should work fine
Hello :wave: Is that documented that autoretry
for Celery tasks is not supported? I just stumbled upon an issue with autoretried errors that appeared in Sentry and I'm not sure if this is a bug or expected behavior.
Best regards
It's a bug, this bug :) I'll prioritize it now since you're the second person to have commented on this
Wow, that was a fast response! Thank you @untitaker, that's not a big inconvenience for me personally, because it's not a big lift to rewrite task to not use autoretry_for
, I just wanted to make sure that I'm not missing something on my part.
Thanks for all the work you guys are doing over here at Sentry :bowing_man:
just commenting that I found the same issue here, though in our case, we have bind=True
so, stealing from the top example
@app.task(bind=True, autoretry_for=(A, B))
def foo(self):
... # do other things that can raise exception A, B
@app.task(bind=True, autoretry_for=(A, B))
def bar(self):
return foo.run()
When we call bar.delay()
if foo
raises A
, then the exception will still go to sentry even though it will be retried.
It's not a huge problem though, since I can rewrite this to use self.retry()
instead of autoretry_for
This issue has gone three weeks without activity. In another week, I will close it.
But! If you comment or otherwise update it, I will reset the clock, and if you label it Status: Backlog
or Status: In Progress
, I will leave it alone ... forever!
"A weed is but an unloved flower." ― Ella Wheeler Wilcox 🥀
Quick update: This issue is still open. And because multiple persons are reporting this it will be kept in our minds for one of the next SDK updates.
@sentrivana can we take another look at this issue?
@sentrivana can we take another look at this issue?
Putting this in To Do so that it gets picked up soon.
This issue is not specific to retrying exceptions. Rather, the Sentry SDK reports any exception raised by a function decorated with Celery's task decorator to Sentry as being an unhandled exception, even when the function is called as a function (not as a task) from within another Celery task which handles the exception. The following code sample illustrates this phenomenon:
from celery import Celery
import sentry_sdk
sentry_sdk.init(dsn="<dsn_here>")
app = Celery("tasks", broker_url="amqp://localhost:5672/")
@app.task()
def divide_by_zero():
return 1 / 0
@app.task()
def try_dividing_by_zero():
try:
print("I wonder what 1 divided by zero is. Let's try it out...")
divide_by_zero()
except ZeroDivisionError:
print(
"Don't be silly! Division by zero is undefined and causes an error in Python."
)
print("Fortunately, I was here to handle the error!")
if __name__ == "__main__":
try_dividing_by_zero.delay()
If you save the above snippet as tasks.py
and then run the script, you will observe based on the output that the ZeroDivisionError gets handled by try_dividing_by_zero
. However, if you log into your Sentry project, you will observe that the ZeroDivisionError was reported as an unhandled exception.
We have decided to close this issue for now, since the fix would likely be quite complicated to implement and there are two relatively simple workarounds:
Workaround specific to autoretry_for
For those who are experiencing issues with auto-retried exceptions being reported to Sentry when the exception occurs in a Celery task that has been called as a function, you can work around the issue by raising Retry
instead of using autoretry_for
per @untitaker's comment:
We generally don't have support for
autoretry_for
, but we should fix that. If you raiseRetry
instead it should work fine
General workaround
Another option to prevent handled exceptions from being reported to Sentry as unhandled exception, which works regardless of whether the issue arises while using autoretry_for
is to simply avoid calling Celery tasks as functions. Instead, create a normal function (i.e. not a Celery task) that does what you would like your function/Celery task to do, and then create a separate Celery task that simply calls your normal function.
For example, if you have the following Celery task that you also sometimes call as a normal function...
@app.task()
def divide_by_zero():
1 / 0
# Elsewhere in the codebase, we call divide_by_zero as a Celery task.
# The exception will be reported to Sentry, as expected.
divide_by_zero.delay()
# Somewhere else (within a Celery task), we call divide_by_zero, but we handle the exception.
# However, the exception is incorrectly reported to Sentry as an unhandled exception due to this issue.
try:
divide_by_zero()
except ZeroDivisionError:
pass
...you can replace it with the following to ensure to exceptions are incorrectly reported to Sentry because of this bug:
def divide_by_zero_normal_func(): # Normal function (not Celery task) with desired logic
1 / 0
@app.task()
def divide_by_zero_celery_task(): # Celery task, which simply calls the normal function
divide_by_zero_normal_func()
# Elsewhere in the codebase, we call divide_by_zero_celery_task as a Celery task.
# The exception will be reported to Sentry, as expected.
divide_by_zero_celery_task.delay()
# Somewhere else (within a Celery task), we call divide_by_zero_as_func, but we handle the exception.
# Since divide_by_zero_as_func is a normal function, not a Celery task, no exception is reported to Sentry, as desired.
try:
divide_by_zero_normal_func()
except ZeroDivisionError:
pass
Are you really closing an issue based on two recommended workarounds which do not work for simple scenarios? I mean, the first recommendation is literally a one-liner with no context. You raise Retry, how do you deal with retry limits? How do you do exponential backoff?... The second , I mean, I don't want to handle the exception and pass, I want to either retry it an not get a sentry error notification, or I exhausted retries and I want an error notification. How do you achieve that?
I get that this may be harder to implement than it looks like, and we're all busy people and have to manage priorities, but can we not sweep this under the rug? I can be sad and accept that it won't be fixed soon, but I think it'd be more honest to the community if this were kept open, it's still an unfixed bug.
@HoneyryderChuck I think your feedback is spot on. For some context this got bumped because it was an example I mentioned in a comment I left on HN for @dcramer (https://news.ycombinator.com/item?id=36971490#36972263).
It's definitely disappointing that it just got closed after that, it would have been better just to leave it alone 🤦.
Hey @HoneyryderChuck and @YPCrumble Thanks for your feedback. You are right, better keeping this open for other people to find it if the encounter the same problem.
For transparency, here are some notes(a tl;dr) from our internal discussions/investigations:
- this is edge casey, if you use a task decorator, this is what happens
- what daniel reported is a more general problem (value of fixing questionable)
- we might be able to fix the autoretry behavior separately and neglect this problem
- if we do want to fix it
- we need to remove
task.run/__call__
patches. patching__call__
is a can of worms because celery is very magical and we have to live with celery’s assumptions and magic - capture the exception slightly more above in
celery/concurrency/*
(i.e.e when a worker executes a certain task from the queue and not the actual task function itself) - for instance here - https://github.com/celery/celery/blob/b96ab282a8a2ea3d97d034f862e9fd6aceb0a0b5/celery/concurrency/base.py#L139-L155
- this would need to be tested thoroughly, risk of breaking existing working stuff
- we need to remove
Closing because the clarifying comment kind of makes every one happy (and if not happy, at least not sad anymore.)
Asking us to use the raise functionality removes the benefit and simplicity of autoretry_for in celery, which means that you don't need to explicitly catch the exceptions, like a RequestError in a celery job that makes an outbound request. This "workaround" means updating every single piece of application code wrapped in autoretry_for. I don't believe this is an acceptable reason to close it - if the answer is "it is too complex to fix" relative to our customers, please be explicit in that.
+1 - this is a basic technique within celery that allows for generalized exception catching. Not supporting this means that we would have to make sentry specific modifications to application flow and code which is the opposite of what you're looking to do with exception tracking software.
@davidemerritt To clarify, you can use autoretry_for
with Sentry. All your exceptions will be reported to Sentry as expected 🙂
As I stated in my previous comment, this problem is, in fact, completely unrelated to autoretry_for
. Rather, if you have a Celery task which raises an error, the error will get reported to Sentry even if the task was called as a function, and the exception got handled outside the task. This is an edge case that likely affects few users, and it can be easily worked around.