atomic-server icon indicating copy to clipboard operation
atomic-server copied to clipboard

Improve performance of very large collections - count expensive (and inaccurate, atm)

Open joepio opened this issue 3 years ago • 1 comments

The /commits collection is almost 4000 resources big. Getting a page takes about 200ms. I think the culprit is the counts field, because the server iterates over all resources that are present. Basically, it's performing 4000 read operations.

We could choose not to return the count field, but that would also mean that we can't let the client know how many pages there are. So no count, no max_page.

Also, this count does not take into account the include_external filter.

Also, the count field might give malicious users means to find out whether a resource that they do not have access to has some attribute, by performing multiple queries, at multiple moments in time, and checking if the count increased.

How can we solve this?

Don't have a count

Simple for the server, but it would mean that the Client needs to make assumptions on pagination. Does a next page exist, for example? We need to change the collection model for doing this.

Hope that sled has some clever method for counting

But I would not count on this being possible

Keep track of the count per collection

Add a new key to the query_index with a shape like QueryObject - count - {number}. Every time an atom is added or removed, increment or decrement this count. Makes updating an atom a little slower, and might become out of sync.

Limit pagination count

We can stop counting after, say, 10 extra pages. The count is maxed out if it's higher than that.

joepio avatar Jan 25 '22 16:01 joepio

I know I've been working on this a couple of weeks ago, but I don't know in which commit or branch.

joepio avatar Mar 23 '22 15:03 joepio