snuba icon indicating copy to clipboard operation
snuba copied to clipboard

Increase max resolution for jitter from 1min to 2min

Open saponifi3d opened this issue 1 year ago • 5 comments

Description

The goal of this PR is to enable the jitter code path here: https://github.com/getsentry/snuba/blob/40d2510c1ab598ba501586688b77189757089cd5/snuba/subscriptions/scheduler.py#L118-L135 for queries with a resolution up to 30 minutes.

Here's what the load profile on metric alert subscriptions looks like that we're trying to fix: https://app.datadoghq.com/dashboard/byg-ici-a72/snuba-subscriptions?fromUser=true&fullscreen_end_ts=1714765439441&fullscreen_paused=false&fullscreen_refresh_mode=sliding&fullscreen_section=overview&fullscreen_start_ts=1714751039441&fullscreen_widget=3556554584510136&tpl_var_consumer_group%5B0%5D=*&tpl_var_sentry_region%5B0%5D=us&view=spans&from_ts=1714750247173&to_ts=1714764647173&live=true

Technical Spec: https://www.notion.so/sentry/Metric-Alert-Load-Shedding-cba94dd0fa434f3bb56a23f920586eb2

saponifi3d avatar May 03 '24 21:05 saponifi3d

We can have up to 30 minute resolutions when using comparison intervals, so maybe bump to 30?

wedamija avatar May 03 '24 21:05 wedamija

Test Failures Detected: Due to failing tests, we cannot provide coverage reports at this time.

:x: Failed Test Results:

Completed 640 tests with 1 failed, 638 passed and 1 skipped.

View the full list of failed tests
Test Description Failure message
Testsuite:
pytest

Test name:
tests.clickhouse.optimize.test_optimize_scheduler::test_get_next_schedule[non parallel]

Envs:
- default
Traceback (most recent call last):
File ".../clickhouse/optimize/test_optimize_scheduler.py", line 254, in test_get_next_schedule
assert optimize_scheduler.get_next_schedule(partitions) == expected
AssertionError: assert OptimizationSchedule(partitions=[["(90,'2022-06-27')",\n "(90,'2022-06-20')",\n "(90,'2022-06-13')",\n "(90,'2022-06-08')"]],\n cutoff_time=datetime.datetime(2024, 5, 5, 0, 0),\n start_time_jitter_minutes=None) == OptimizationSchedule(partitions=[["(90,'2022-06-27')",\n "(90,'2022-06-20')",\n "(90,'2022-06-13')",\n "(90,'2022-06-08')"]],\n cutoff_time=datetime.datetime(2024, 5, 4, 0, 0),\n start_time_jitter_minutes=None)

Matching attributes:
['partitions', 'start_time_jitter_minutes']
Differing attributes:
['cutoff_time']

Drill down into differing attribute cutoff_time:
cutoff_time: datetime.datetime(2024, 5, 5, 0, 0) != datetime.datetime(2024, 5, 4, 0, 0)

codecov[bot] avatar May 03 '24 21:05 codecov[bot]

Updated the max resolution based on this convo: https://sentry.slack.com/archives/CLTE78L73/p1714771659170539 -- will see how 2min affects things and increase to size.

I'm still looking into the test failures.

saponifi3d avatar May 04 '24 16:05 saponifi3d

@saponifi3d @wedamija is this PR still relevant?

MeredithAnya avatar Dec 09 '24 19:12 MeredithAnya

@MeredithAnya i think so, we are still seeing some sawtooth in the graph for processing, this should help smooth when we execute snuba subscriptions

I can try to revive this PR and get it green, if it looks okay otherwise.

saponifi3d avatar Dec 09 '24 19:12 saponifi3d