Flatten `Status.tagged_with_all` scope
Previously we were looping through all the supplied tag ids and appending a new condition for each one so that the generated query was a collection of AND'd EXISTS statements. Query times under this approach are reasonable with small tag id counts, but grow as the tag id array size grows.
The update uses a group by / having approach instead. Query times remain fairly constant under this approach, with new IDs added to an IN condition.
As I was working on this, it seemed vaguely familiar and I realized there was a prior attempt at similar change:
- Original attempt which fixed brakeman warning and converted the structure to a subquery approach -- https://github.com/mastodon/mastodon/pull/25882 -- we benchmark'd and discovered bad performance with the subquery
- Actual change which fixed brakeman and moved from many
INNER JOINto manyEXISTS-- https://github.com/mastodon/mastodon/pull/25941
I think the change here (by avoiding subquery) should be more performant than the last approach ... but I'd love feedback on both perf aspects, and whether there are any scenarios we need to expand spec coverage on. There is coverage for this around various scenarios of statuses with none/some/all of the passed in tags, but if we have edge cases I'd like to get them in specs before fully reviewing this.
I did some lightweight local benchmarking -- I created 1000 each Status and Tag, then I looped through all statuses, assigned 25 random tags to each.
First, I found the most used tags, and passed in an increasing number of tag IDs from that collection into the query:
| Tag IDs | Old | New |
|---|---|---|
| 1 | 1.0 | 1.2 |
| 2 | 1.1 | 1.4 |
| 3 | 1.7 | 1.5 |
| 4 | 2.6 | 1.7 |
| 5 | 6.7 | 1.6 |
Then I took a subset of tags actually used by a random status, and passed those in, also increasing number each time:
| Tag IDs | Old | New |
|---|---|---|
| 1 | 1.1 | 1.3 |
| 2 | 1.1 | 1.3 |
| 3 | 1.7 | 1.3 |
| 4 | 2.6 | 1.4 |
| 5 | 4.4 | 1.5 |
All times are in ms there.
Running explain on the queries makes sense of this as well ... in the current approach of appending more and more exists together, the explain tree keeps getting more and more nested the more IDs there are. The approach here keeps the explain structure flat, and just increases the IDs in the IN portion. So the performance is similar for small number of IDs, and starts to skew in favor of the approach here once its larger and larger (on comically larger values its very dramatic -- ie, 25 tag IDs is ~88ms on current approach, and ~2ms on this PR approach).
If you have sample SQL queries (and Rails commands to find relevant objects to run them with), we can try running them on mastodon.social and see how is the perf
If you have sample SQL queries (and Rails commands to find relevant objects to run them with), we can try running them on mastodon.social and see how is the perf
Good idea - will add something shortly. I just realized that the change here fails to handle the "nothing passed in" case. Going to add unit spec for that, and update. After that, will post some benchmark ideas.
Updated with a commit to just avoiding calling the scope w/ nil values for now.
For benchmarking ... you could grab some collection of the most/least used tag ids as an array in irb, and then pass them through the original and changed method. For my local run I renamed the old one tagged_with_all_original and ran them side by side a bunch. You could call the new one something else, or just using the scope chain directly w/out defining a scope/method.
May be useful to put an explain after each to see the full plan cascade effect as well.
Did we ever do any benchmarking on this, or do I need to provide a more detailed code snippet to use?