atomic-server
atomic-server copied to clipboard
Improve performance of very large collections - count expensive (and inaccurate, atm)
The /commits collection is almost 4000 resources big. Getting a page takes about 200ms. I think the culprit is the counts field, because the server iterates over all resources that are present. Basically, it's performing 4000 read operations.
We could choose not to return the count field, but that would also mean that we can't let the client know how many pages there are. So no count, no max_page.
Also, this count does not take into account the include_external filter.
Also, the count field might give malicious users means to find out whether a resource that they do not have access to has some attribute, by performing multiple queries, at multiple moments in time, and checking if the count increased.
How can we solve this?
Don't have a count
Simple for the server, but it would mean that the Client needs to make assumptions on pagination. Does a next page exist, for example? We need to change the collection model for doing this.
Hope that sled has some clever method for counting
But I would not count on this being possible
Keep track of the count per collection
Add a new key to the query_index with a shape like QueryObject - count - {number}.
Every time an atom is added or removed, increment or decrement this count.
Makes updating an atom a little slower, and might become out of sync.
Limit pagination count
We can stop counting after, say, 10 extra pages. The count is maxed out if it's higher than that.
I know I've been working on this a couple of weeks ago, but I don't know in which commit or branch.