stash icon indicating copy to clipboard operation
stash copied to clipboard

Improving pagination performance by caching results of larger queries

Open neru2132 opened this issue 1 month ago • 4 comments

Spinning off my comment from #6319 since that issue has been closed.

Core problem: Pagination in tabs with lots of results can be very slow. Every page access requires performing the full query. This results in sorting p*(p+1)/2 * perPage records rather than the minimum required perPage * p when paginating through p pages.

Steps to reproduce

Here's an example from my setup with 1.7M images: https://localhost:9999/images?sortby=path&perPage=20&p=00002 2.6s https://localhost:9999/images?sortby=path&perPage=20&p=00020 2.9s https://localhost:9999/images?sortby=path&perPage=20&p=00200 3.2s https://localhost:9999/images?sortby=path&perPage=20&p=02000 6.1s https://localhost:9999/images?sortby=path&perPage=20&p=20000 19s https://localhost:9999/images?sortby=path&perPage=20&p=40000 22.8s https://localhost:9999/images?sortby=path&perPage=20&p=80000 22.8s

Now consider this query that essentially caches 100 of the previous queries for a memory cost of roughly 117 kB: https://localhost:9999/images?sortby=path&perPage=2000&p=002 3.6s https://localhost:9999/images?sortby=path&perPage=2000&p=020 6.5s https://localhost:9999/images?sortby=path&perPage=2000&p=200 19.5s https://localhost:9999/images?sortby=path&perPage=2000&p=400 23s https://localhost:9999/images?sortby=path&perPage=2000&p=800 22.9s

Of course, you can already crank up the number of results per page right now, but in some cases this may not be desirable. If you go past 10000 results per page, both UI and GQL engine will even give up at some point.

What's not possible right now is to load, for example, 2000 results into cache but only display 20 at a time (a two tiered pagination, if you want to think about it like that). If something like that were to be implemented, you could even load the adjacent "chunk(s)" in the background while the user is still browsing the currently cached "chunk" which should enable near instant switching between close by pages. After that, only loading the initial page and loading far apart pages would still take a long time, but probably not noticeably longer than it does right now. Maybe there's a flaw in my logic, though.

Expected behaviour

When paginating, the user should not have to wait for repeatedly sorting the same results.

neru2132 avatar Dec 09 '25 16:12 neru2132

So basically pre-fetching? I'm not sure if caching is worth the effort especially as you're going to run into a LOT of cache invalidation issues. Changing sort will break it, changing any filters will break cache and it can only live for so long. Also Apollo already has caching built in.

pre-fetching would basically be frontloading your query with with the slowdown since it also has to load thumbnails

I would recommend increasing your SQL cache size instead and see what that does for your performance, since your response times seem to point towards a SQLite bottleneck or a result of the photos in those sets

feederbox826 avatar Dec 10 '25 06:12 feederbox826

Changing STASH_SQLITE_CACHE_SIZE does not have any effect.

I'm of course happy for any alternative solution, but the current state is a bit dire.

I don't think the upfront performance cost is all that significant. And I also don't think that it's necessary to load the thumbnails for adjacent pages.

I think it makes sense to optimize for common use cases. In my opinion it is far more intuitive if there is a wait between changing sorting options rather than changing between consecutive pages.

Take this example paginating through pages 30001 to 30010 in the default images tab: https://localhost:9999/images?sortby=path&p=30001

Image

I think it should be obvious that having to wait for more than 3 minutes just to get through 10 pages (or 400 images) is not really acceptable.

Just for comparison, here I paginate through the same images with 2000 images per page (in this case even with fetching the thumbnails): https://localhost:9997/images?sortby=path&perPage=2000&p=601

Image

As I said, all other performance considerations appear to be almost negligible. It appears to be all in the sorting.

neru2132 avatar Dec 10 '25 09:12 neru2132

I don't have images to replicate but I think it might be unique or limited to images, I can fetch 1000 scenes in 1.93s. This can come down to so many factors like disk speed, indicies (or lack thereof).

DEBUG/TRACE logs would be infinitely more helpful that more screenshots of gql inspector. Could be possible with a plugin since you can take advantage of apollo's caching

feederbox826 avatar Dec 10 '25 10:12 feederbox826

I don't really think the performance numbers are up for debate. WP has mentioned as much in https://github.com/stashapp/stash/pull/6370#issuecomment-3616037713. There are other avenues currently being pursued to tackle the problem. This issue is just meant to shine a light on the inherent problem of performing a complex query from scratch for each page when you have already calculated most of the computational load for the previous page.

Edit: The "bug report" label is incorrect. It is what the submitter of the parent issue chose initially, so I followed suit. Like that issue, the label should be changed to "improvement", but I don't know if I can do that myself.

neru2132 avatar Dec 10 '25 14:12 neru2132