sentry-python Exceptions are not consistently cleaned up after being passed to capture

How do you use Sentry?

Sentry Saas (sentry.io)

Version

1.5.10

I've confirmed this happens with both Python 3.10.0 and 3.10.4.

Steps to Reproduce

Raise an exception
Catch the exception
Capture with sentry_sdk.capture_exception()
Do nothing for a while (e.g. sleep)

To test, I used the following sample that prints to console when an exception is created or destroyed:

import os
import time
import sentry_sdk


class SomeException(Exception):
    count = 1
    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)
        self.instance = SomeException.count
        SomeException.count += 1
        print(f'SomeException created: {self.instance}.')

    def __del__(self):
        print(f'SomeException destroyed: {self.instance}')


def test():
    try:
        raise SomeException
    except Exception as e:
        sentry_sdk.capture_exception(e)


sentry_sdk.init(
    dsn=os.environ['SENTRY_DSN'],
    environment='development',
)

for i in range(10):
    test()
# sentry_sdk.flush()

print('Waiting...')
time.sleep(10)
print('Done.')

Expected Result

Exception object should be dropped and destroyed as soon as Sentry flushes events.

Actual Result

Running the sample as-is produces the following output:

SomeException created: 1.
SomeException created: 2.
SomeException created: 3.
SomeException destroyed: 2
SomeException created: 4.
SomeException created: 5.
SomeException created: 6.
SomeException created: 7.
SomeException created: 8.
SomeException created: 9.
SomeException created: 10.
Waiting...
Done.
SomeException destroyed: 1
SomeException destroyed: 4
SomeException destroyed: 5
SomeException destroyed: 6
SomeException destroyed: 7
SomeException destroyed: 8
SomeException destroyed: 9
SomeException destroyed: 3

If I uncomment the manual call to flush(), exceptions #4-9 are dropped before the sleep() call instead of after. 1 and 3 are still not dropped until the program exits.

In no case does exception #10 (the most recent one raised) print its __del__ message. Not sure why that is.

I understand that Sentry maintains a queue internally, so I expect these objects to live until the queue is cleared out. What I find surprising is that a) the first exception (and apparently the third as well?) is not dropped until the program exits, and b) if there are outstanding events in the queue, they aren't automatically flushed after some period of time. I would expect a periodic flush (say, every 5-20 seconds) to catch situations where just a single error is raised and not repeated. However I've tried sleeping for up to 5 minutes and the queue does not seem to ever flush.

This behavior (holding onto raised exceptions) is a problem for us because some of the locals referenced by the traceback have resources that need to be released.

Apr 21 '22 21:04 jfmontanaro

Hey @jfmontanaro

Thanks for this amazing report. From the top of my head I do not know why the SDK is behaving as it is behaving. This will need some digging. I put it on the backlog. But I will only be able to look at this after PyCon, so late next week at the earliest.

Apr 22 '22 10:04 antonpirker

Thanks @antonpirker! Appreciate your quick response. :)

No rush, we are able to work around the issue in the specific case where we've run into it, so it isn't urgent. Have fun at PyCon!

Apr 22 '22 16:04 jfmontanaro

Hi @jfmontanaro, thank you for submitting this issue. After taking a closer look, we have decided to close the issue for now, since we believe the behavior you have described does not indicate a bug in the SDK.

Specifically, it appears that the reason that most of the exceptions are only cleared after the sleep call is because the garbage collector only automatically runs after the sleep timeout. If you manually trigger garbage collection via gc.collect() immediately before calling sleep, you will observe that all the exceptions, except the last one, are cleared before sleeping (even without a call to sentry_sdk.flush()). If you instead place the gc.collect() call within the exception handler, below the capture_exception call, you will observe that immediately after the $n$-th exception is captured, the $(n-1)$-th exception is cleared. This behavior indicates that the SDK maintains a reference only to the most recently captured exception; the delay with clearing any earlier exceptions is fully explained by the Python garbage collector's internal workings, which we have no control over.

However, more fundamentally, we would like to make clear that the sentry_sdk.capture_exception method provides no explicit guarantee of when exactly any exception passed to it will get cleaned up, and we would recommend against relying upon the method to clean up any exception at any given point in time. That is to say, even though it appears that we currently maintain a reference only to the most recently captured exception, we advise against relying on this behavior since it is internal to the SDK and subject to change at any time.

We would need to investigate further to determine why we hold onto a reference to the most recent exception, and to determine whether it is possible and how big of a change it would be to change this behavior so that we would be able to guarantee that we don't produce any long-lived references to any exceptions passed to capture_exception. However, as long as you are able to work around this issue, we believe the current behavior of holding onto a reference to the last exception is acceptable and does not amount to a bug in the SDK, since the memory penalty for doing so is only $\mathcal{O}(1)$. If your exception traceback contains resources that need to be released, we would recommend creating a copy of the exception, with the resources removed, that would then be passed into capture_exception; or, you can continue with your current workaround.

If the workaround is too tedious or the current behavior causes a large inconvenience, please feel free to reopen the issue. We are happy to consider it as an improvement.

Nov 14 '23 16:11 szokeasaurusrex

Exceptions are not consistently cleaned up after being passed to capture_exception()

How do you use Sentry?

Version

Steps to Reproduce

Expected Result

Actual Result