sentry icon indicating copy to clipboard operation
sentry copied to clipboard

feat(metrics_indexer): Add rate limits functionality to indexer, take 2 [INGEST-1380]

Open untitaker opened this issue 1 year ago • 7 comments

This reverts commit a65c78a.

Reverts https://github.com/getsentry/sentry/pull/36524 Reapplies https://github.com/getsentry/sentry/pull/36263 + typing fixes

Split up in the following PRs

Merge in this order (it should be safe to deploy after each merge)

  • [x] https://github.com/getsentry/sentry/pull/37168
  • [x] https://github.com/getsentry/sentry/pull/37170
  • [x] https://github.com/getsentry/sentry/pull/37171
  • [x] https://github.com/getsentry/sentry/pull/37173
  • [ ] https://github.com/getsentry/sentry/pull/37174

untitaker avatar Jul 11 '22 17:07 untitaker

@fpacifici We have integration tests that ended up not testing anything because they are using the mock indexer (which doesn't have rate limits) together with mocked/stubbed fetch results coming from the rate limiter, and those went out of sync with the actual implementation once I added FetchTypeExt in the final round of reviews.

We want to be able to test message processing and rate limiting together, i.e. their interaction, because that's the boundary where things broke. At the same time, if we want to test this integration to death we ideally find a way to do it without hitting the database in order to keep the testsuite fast. So this suggests to me that we should figure out a way to detach rate limiting (and maybe caching as well) from the postgres_v2 backend, so it can be tested together with message processing without hitting the DB.

So let's say we build this Batch abstraction, and if I understand you right it would not handle rate limiting, and the responsibilities of indexer.bulk_record would stay the same. Then we can unittest that to death, but as usual those unittests will need to simulate whatever can be returned from the indexer (and that can go out of sync with the actual implementation again), or they hit the real postgres indexer (in which case the tests will be very slow). That's fundamentally the same problem we're dealing with already.

lmk what you think

untitaker avatar Jul 13 '22 11:07 untitaker

I think the two approaches are not incompatible. You are right we need a better way to test what comes from the indexer and rate limiter. Still testing the processing logic in isolation is not supposed to be alternative to that. While this time the issue was related with the data returned by the indexer, last time it was pure processing.

But I would be fine if you do not want to break it apart within this task.

fpacifici avatar Jul 14 '22 00:07 fpacifici

Another idea to make it harder to break and easier to test

Why don't we return a properly structured type here, the current state feels like a case of Primitive Obsession : https://github.com/getsentry/sentry/pull/36531/files#diff-4d832960b67c8c52695fce355ad53704c4c0bab03ef0fdd5c487817d383094edR162

If that returned a Mapping[str, Metadata] where Metadata is a dataclass or a named tuple with properties instead of a bare tuple, the type checker should have noticed the issue. We would not be able to do tuple unpacking. That KeyResult originally returned Mapping[str, int] so fairly harmless. Then we added metadata, then FetchType and then FetchTypeExt. Now it is likely abusing that original Mapping.

fpacifici avatar Jul 14 '22 00:07 fpacifici

@fpacifici had to do a double-take on your first post, if I understood it right then it's actually really simple and https://github.com/getsentry/sentry/pull/36669 is basically it

untitaker avatar Jul 14 '22 11:07 untitaker

@fpacifici thanks for the thorough review and for keeping coding standards high. I think I have addressed all review comments.

I can split up the PR. Should I do this before or after you review the recent changes?

untitaker avatar Jul 27 '22 11:07 untitaker

I can split up the PR. Should I do this before or after you review the recent changes?

Sorry I did not see this comment before. I think it is good to split. No need for another round. Though I will try to do another review tonight before you will wake up. If you do not see it by the time you start working, please go ahead.

fpacifici avatar Jul 28 '22 01:07 fpacifici

I split up the PR into five different ones and edited the PR description to reflect in which order they need to be merged. After that I'll rebase this PR and hopefully the diff empty, if not I guess I'll just merge it.

untitaker avatar Jul 28 '22 09:07 untitaker

This pull request has gone three weeks without activity. In another week, I will close it.

But! If you comment or otherwise update it, I will reset the clock, and if you label it Status: Backlog or Status: In Progress, I will leave it alone ... forever!


"A weed is but an unloved flower." ― Ella Wheeler Wilcox 🥀

github-actions[bot] avatar Aug 25 '22 00:08 github-actions[bot]