shimmie2
shimmie2 copied to clipboard
Optimisations for search queries
Disclaimer: I'm not experienced with SQL, so feel welcome to correct any mistaken assumptions I've made.
The first commit prevents tag queries from selecting all columns when only the id is being used, improving the speed of the join operations. The second commit prevents all columns being fetched when only the total number of rows is relevant.
These are both improvements by themselves, although the first is also a useful step towards other optimisations, especially those needed to make INTERSECT and UNION operations efficient, which are necessary for https://github.com/shish/shimmie2/issues/292
So I was looking at this and thinking "it's a great idea, something feels more complicated than needed though" - after much thinking and failing to put my finger on it, I opened up my editor and tried to figure out how I would write this code if I were doing it from scratch. But before I'd actually managed to write anything, Copilot had auto-completed what appears to be an even simpler and faster approach than either of our ideas :O The result is at #1090
a useful step towards other optimisations, especially those needed to make INTERSECT and UNION operations efficient, which are necessary for https://github.com/shish/shimmie2/issues/292
I wonder about this though - if your approach is a little bit more complicated, but opens the door to useful features in a way that #1090 doesn't, then I would go with this approach 👀
I'm ready to upload my disjunctive ('OR') code as a draft pull request for reference, although be aware that I based it over the two commits in this merge request so I will need to force commit to it. (update: #1092)
By using images.id
instead of images.*
until a final JOIN INNER, I'm able to significantly improve the speed, so I believe the co-pilot approach can be adapted but it shouldn't be considered a replacement for the [core] only select id column on tag queries
commit in this merge request until it uses .id
.
Using .id only also allows me to avoid using INTERSECT, in favour of the much faster AND [x] IN ([y]), when factoring in disjunctive results. This reduces the query time by orders of magnitude, so I consider it necessary.
Marked as draft to avoid this being merged as-is:
- If #1090 is merged, this will need to be resubmitted.
- If #1153 is merged, this will need to be refactored.
This code should be rewritten and resubmitted after those pull requests are accepted.