sentry-python icon indicating copy to clipboard operation
sentry-python copied to clipboard

No Quota Separation in Transport Layer

Open narobertson42 opened this issue 1 year ago • 3 comments

How do you use Sentry?

Sentry Saas (sentry.io)

Version

2.9.0

Steps to Reproduce

  1. Exhaust performance units quota

Image

  1. Attempt to send performance metrics, and you will receive rate limiting issues:
 [sentry] WARNING: Rate-limited via x-sentry-rate-limits
 [sentry] DEBUG: [Monitor] health check negative, downsampling with a factor of 1
  1. Trigger an error event, it will incorrectly be impacted by the rate limits
[sentry] INFO: event processor (<function DedupeIntegration.setup_once.<locals>.processor at 0xffffa2193560>) dropped event

Expected Result

The Sentry SDK needs a more specific approach to handling rate limits for different event types, such that performance metrics are rate limited when the quota is expended, but error events are still sent to Sentry.

Actual Result

KeyError: 'TriggerTestEvent' 
[sentry] INFO: event processor (<function DedupeIntegration.setup_once.<locals>.processor at 0xffffa2193560>) dropped event
 [sentry] DEBUG: [Monitor] health check negative, downsampling with a factor of 2
 [sentry] DEBUG: [Monitor] health check negative, downsampling with a factor of 3
 [sentry] DEBUG: [Monitor] health check negative, downsampling with a factor of 4
 [sentry] DEBUG: [Monitor] health check negative, downsampling with a factor of 5
 [sentry] DEBUG: [Monitor] health check negative, downsampling with a factor of 6

narobertson42 avatar Sep 26 '24 11:09 narobertson42

Hi @narobertson42, have you noticed errors actually missing in Sentry? From the log message you provided, it appears that error events are being dropped due to our deduplication, not because we are incorrectly rate-limiting your error events.

[sentry] INFO: event processor (<function DedupeIntegration.setup_once..processor at 0xffffa2193560>) dropped event

If you are absolutely sure that your error events are actually missing, would you be able to provide a minimal reproduction of the problem?

szokeasaurusrex avatar Sep 26 '24 15:09 szokeasaurusrex

@szokeasaurusrex

There were no error events appearing in my sentry dashboard until I paid to increase the performance limit, despite having plenty of error limit remaining.

Image

import sentry_sdk
import time
from sentry_sdk import start_transaction

sentry_sdk.init(
    dsn="<your_dsn>",
    traces_sample_rate=1.0,
    profiles_sample_rate=1.0,
    dev=True,
)


def simulate_performance_event():
    with start_transaction(op="task", name="Performance Event"):
        time.sleep(0.5)
        print("[INFO] Performance event recorded.")


def simulate_error_event():
    try:
        raise ValueError("TriggerTestEvent")
    except Exception as e:
        sentry_sdk.capture_exception(e)
        print("[INFO] Error event recorded.")


for _ in range(20):
    simulate_performance_event()

simulate_error_event()

narobertson42 avatar Sep 26 '24 15:09 narobertson42

This was a decision made by me while building the backpressure solution about the health of the system. This is because relay does not distinguish whether the rate limit is due to quota or spike protection. In the latter case, I wanted backpressure to kick in which is why I made this decision. I will think about changing this to only work with transaction rate limits or make it configurable.

In the long term, it would be great if relay could somehow distinguish the two rate limit cases but I don't think we can count on that anytime soon.

sl0thentr0py avatar Sep 26 '24 20:09 sl0thentr0py