sentry icon indicating copy to clipboard operation
sentry copied to clipboard

Optimize transitions between grouping configs

Open lobsterkatie opened this issue 2 years ago • 0 comments

Planning

  • [x] Write initial tech spec - https://www.notion.so/sentry/Tech-Spec-Optimizing-transitions-between-grouping-configs-39e562cf8bec4aefaa79bbbd320b4aec?pvs=4
  • [ ] Figure out how to store non-rule changes so they can be walked back

Milestone 1: Only calculate old hash if new hash not found

Prep

  • [x] Move all helper functions to grouping module, add missing tests (Done in https://github.com/getsentry/sentry/pull/62067, https://github.com/getsentry/sentry/pull/62203, https://github.com/getsentry/sentry/pull/62888, https://github.com/getsentry/sentry/pull/62889, https://github.com/getsentry/sentry/pull/62856, and https://github.com/getsentry/sentry/pull/62887)
  • [x] Update background grouping to make a copy of the event, the same way secondary grouping does (done in https://github.com/getsentry/sentry/pull/62856)
  • [x] Only ever run background grouping before main grouping (done in https://github.com/getsentry/sentry/pull/63715, option removed in https://github.com/getsentry/sentry-options-automator/pull/622 and https://github.com/getsentry/sentry/pull/63793)

Refactor

  • [x] Pull all grouping logic besides migrate_off_hierarchical and update_grouping_config_if_needed from _save_error_events into new helper get_hash_values (done in https://github.com/getsentry/sentry/pull/63731)
  • [x] Move all grouping code from save_error_events to _save_aggregate (Done in https://github.com/getsentry/sentry/pull/63805)

Set up feature flagging

  • [x] Introduce feature flag for using new logic (flag added in https://github.com/getsentry/sentry/pull/64128, handler added in https://github.com/getsentry/getsentry/pull/12753)
  • [x] Wrap _save_aggregate in helper assign_to_group which includes logic for which function to call (both branches call _save_aggregate for now) (done in https://github.com/getsentry/sentry/pull/64312)

Write initial tests

  • [ ] Identify/write tests for the obvious happy path(s) with feature flag off, make sure they pass (test helper added in https://github.com/getsentry/sentry/pull/64856, tests themselves added in https://github.com/getsentry/sentry/pull/64689)
  • [ ] Adjust tests so they test both the flag-on and flag-off branches of assign_to_group (Done in https://github.com/getsentry/sentry/pull/64851)

Prep for updated logic

  • [ ] Create copies of _save_aggregate and find_existing_grouphash and call them when feature flag is on (Done in https://github.com/getsentry/sentry/pull/64851 and https://github.com/getsentry/sentry/pull/64857)
  • [ ] Remove hierarchical code from new copies of _save_aggregate and find_existing_grouphash (Done for _save_aggregate_new in https://github.com/getsentry/sentry/pull/64858 and for find_existing_grouphash_new in https://github.com/getsentry/sentry/pull/64970)
  • [ ] Do other general cleanup of _save_aggregate_new (Done in https://github.com/getsentry/sentry/pull/64974 and https://github.com/getsentry/sentry/pull/64975)
  • [ ] Update _save_aggregate tests to run on both versions of _save_aggregate

Update logic

  • [ ] Pull logic from get_hash_values_new back into _save_aggregate_new so order can be mixed
  • [ ] Change order so only primary hashes are calculated at first, and secondary hashes are only calculated if primary hash not found

Increase coverage

  • [ ] Add tests for likely edge cases (are there any?), make sure they pass both ways

Add/update metrics

  • [ ] Add metrics to track difference in outcome IRL and in a hypothetical transition period (specific metrics TBA)
  • [ ] Add metrics to dashboard

Internal Rollout

  • [ ] Turn feature flag on for sentry org
  • [ ] Turn on side effects for _save_aggregate_new and adjust logic in _assign_to_group to only call _save_aggregate_new instead of calling both _save_aggregate and _save_aggregate_new when flag and non-mobile config conditions are met (this means we're actually using the new values internally, rather than just calculating them to compare)

Wider Rollout

  • [ ] Introduce settable constant for sample rate of testing (0 by default)
  • [ ] Increase sample rate to > 0
  • [ ] Is there anything to do besides adjusting sample rate and options?

Force orgs off of mobile config (can be done later)

  • [ ] Introduce feature flag for force-upgrading people (off by default)
  • [ ] Introduce settable constant for sample rate of upgrading so not all transitions run at once (0 by default)
  • [ ] Change logic for upgrading so that it ignores do-not-upgrade option if sample rate says yes and either config is mobile

lobsterkatie avatar Dec 13 '23 06:12 lobsterkatie