sentry icon indicating copy to clipboard operation
sentry copied to clipboard

Large events strategy

Open lhunath opened this issue 1 year ago • 9 comments

Problem Statement

https://develop.sentry.dev/sdk/envelopes/#size-limits Reading the docs on envelope size limits has me worried: what is the action taken by Sentry should my events reach the limit? The docs do not specify. As such, I need to assume the worst-case scenario: the server will drop the envelope and all events will be lost.

Is this correct, or will the Sentry server trim the envelope intelligently until it reaches the limit? Seeing as the network data was already consumed at this point anyway — it seems like a blind drop doesn't benefit the server but in storage.

If a drop /is/ what happens, will I learn about it somehow, such as through a drop event recorded in its stead?

Since I attach runtime logs to the event as breadcrumbs, there is a real possibility that a long or heavy runtime might lead to hitting the limit. It's quite important that I can be aware of this rather than having a blind-spot to it and silently building in a mechanism that ignores crashes from a certain segment of the user-base.

Solution Brainstorm

We do have SentryOptions.maxBreadcrumbs, an imperfect guard. The default maximum is set to 100, which feels low as a count of log statements. Of course, it is also no guarantee that it will keep the envelope within the server-defined limits. Furthermore, it is undefined by the documentation what happens when the limit is hit. Will new breadcrumbs drop old ones (sounds good) or will new breadcrumbs be dropped once the limit is reached (sounds bad).

We can use SentryOptions.beforeSend to introspect the event before send-out and assess its size. Theoretically this might be an opportunity for us to vet it against the limit and trim it manually before sending, but we have a few issues: [1] we do not know what the current limits are (we can hard-code the 1MB event limit from the doc), [2] we do not know the byte-size of the event. The best proxy is event.serialize() which yields us a dictionary, but AFAIK Sentry itself uses an internal MsgPack implementation that we cannot access, so at best we can JSON serialize or over-count with something like event.serialize().description.utf8.count.

If I had to assess who is best equipped to deal with the problem, then my list from best-to-worst would be:

  1. The Sentry server
  2. The Sentry SDK
  3. The app developer

Where the strategy from best-to-worst would be:

  1. Trim the envelope intelligently/incrementally, dropping oldest/least relevant data points first.
  2. Replace with a "envelope limit reached" event that contains some basic details.
  3. Drop the envelope silently, ignoring the events with no recourse.

Are you willing to submit a PR?

No response

lhunath avatar Apr 30 '24 12:04 lhunath

Worth mentioning https://github.com/getsentry/sentry-java/issues/910

lhunath avatar Apr 30 '24 12:04 lhunath

What's curious to note is that when I add > 100 breadcrumbs to the scope, Sentry stops invoking beforeSend:

[Sentry] [debug] [SentryCrashStackCursor_SelfThread:61] Retrieving backtrace with async swift stitching...
[Sentry] [debug] [SentryCrashStackCursor_SelfThread:78] Finished retrieving backtrace.

At this point, adding < 100 breadcrumbs halts the code in the beforeSend call-back. Adding > 100 breadcrumbs skips the callback and proceeds directly to sending the event off:

[Sentry] [debug] [SentryFileManager:321] Writing envelope to path: /Users/hubstaff/Library/Developer/CoreSimulator/Devices/🆔/data/Containers/Data/Application/🆔/Library/Caches/io.sentry/a999994c4ff1a25927f709eb50500057d6cdae95/envelopes/1714509134.833000-00000-🆔.json
[...]

I should note that I have SentryOptions.maxBreadcrumbs set to 500, and still I am seeing this odd behaviour.

That clearly suggests a bug to me, my beforeSend should not be skipped, it does important things.

As a result, it is now also impossible to use the beforeSend call-back to intelligently trim the size of the event to remain within the server-documented size limit.

lhunath avatar Apr 30 '24 20:04 lhunath

Hey @lhunath thanks for reaching out - the product docs on size limits offer more details about the handling. In short, there's an overall size limit for the whole compressed/uncompressed event, which causes events to be dropped. After that, individual fields are trimmed as applicable.

The number of events that are dropped because they exceed the overall size limit are shown as part of the "Dropped" stat on the Stats page.

Wrt. the breadcrumbs limit - yes old breadcrumbs will be removed as new ones are added (LIFO).

kahest avatar May 02 '24 12:05 kahest

What's curious to note is that when I add > 100 breadcrumbs to the scope, Sentry stops invoking beforeSend:

This is a bug - thanks for opening a separate issue, we'll investigate!

kahest avatar May 02 '24 12:05 kahest

Thank you, I regret I was unable to find this information on my own from reading the docs. Perhaps some additional pointers could be added with respect to the server behaviour & dropped events statistics, in the places where it is missing.

Unfortunately, dropped events is rather ambiguous and there's very little clarity in what the cause was for the dropped event, or what it was about. It would appear that SentrySDK is in a good position to attempt a drop recovery of some sort by uploading a minimal stub with some minimally safe information for the event, should this occur as a result of a known and recoverable situation such as envelope size limits.

At a minimum, it might be valuable to expose to the developer an onError callback to inform him of the circumstance and give him an opportunity to accommodate.

lhunath avatar May 02 '24 13:05 lhunath

Thanks for the feedback about the docs - I'll see what we can do to make this info easier to find 👍

The overall topic would involve all SDKs, API, Ingestion, etc. so I'll cycle back to the larger team and may move this issue to a different repo later

kahest avatar May 02 '24 16:05 kahest

related: https://github.com/getsentry/sentry/issues/59320

kahest avatar May 06 '24 07:05 kahest

Assigning to @getsentry/support for routing ⏲️

getsantry[bot] avatar May 06 '24 15:05 getsantry[bot]

Routing to @getsentry/product-owners-stats for triage ⏲️

getsantry[bot] avatar May 06 '24 15:05 getsantry[bot]