sentry-python icon indicating copy to clipboard operation
sentry-python copied to clipboard

Make `DedupeIntegration` more memory efficient.

Open antonpirker opened this issue 6 months ago • 4 comments

Users reported that the DedupeIntegration can use up a lot of memory, because it keeps a full exception in memory for checking if it has seen this exception already.

Depending on the users code those exception objects can be big because they also include the traceback and local variables (which can be huge).

Idea is now to not save the whole exception but just a hash of the important parts of the exception to decide if we have seen this exception again.

fixes https://github.com/getsentry/sentry-python/issues/3165 fixes https://github.com/getsentry/sentry-python/issues/4327

antonpirker avatar Jun 05 '25 11:06 antonpirker

Codecov Report

:white_check_mark: All modified and coverable lines are covered by tests. :white_check_mark: Project coverage is 84.58%. Comparing base (9001126) to head (d6ed7a8).

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #4446      +/-   ##
==========================================
- Coverage   84.60%   84.58%   -0.02%     
==========================================
  Files         158      158              
  Lines       16463    16463              
  Branches     2850     2850              
==========================================
- Hits        13928    13926       -2     
- Misses       1694     1696       +2     
  Partials      841      841              
Files with missing lines Coverage Δ
sentry_sdk/integrations/dedupe.py 87.50% <100.00%> (ø)

... and 1 file with indirect coverage changes

codecov[bot] avatar Jun 05 '25 11:06 codecov[bot]

@antonpirker can't you just id(exc)?

sl0thentr0py avatar Jun 05 '25 11:06 sl0thentr0py

i thought about that too. but garbage collection moves objects around. so id(exc) could be different when we safe it and when we compare it...

antonpirker avatar Jun 05 '25 11:06 antonpirker

The id() of an object is guaranteed to never change during the life cycle of the object. So taking it.

antonpirker avatar Jun 06 '25 07:06 antonpirker

I will close this in favor of https://github.com/getsentry/sentry-python/pull/4809

Reason: Python reuses memory addresses after garbage collection. Making this path unreliable.

antonpirker avatar Sep 17 '25 12:09 antonpirker