flask-restless icon indicating copy to clipboard operation
flask-restless copied to clipboard

Incredibly slow performance with GET_MANY on large tables (PostgreSQL)

Open LordSputnik opened this issue 9 years ago • 2 comments

When using PostgreSQL and Flask Restless, GET_MANY requests for large tables (>100K rows) can take up to a minute to return a response.

This is because Flask Restless performs a COUNT query at some point while obtaining the results. In PostgreSQL, due to multi-version concurrency control, each COUNT must (usually) scan all rows of the table, leading to consistently long response times.

By comparison, a simple

SELECT * FROM <table> OFFSET <x> LIMIT <y>

is much faster. But even when turning pagination off (I assumed the COUNT was needed for the num_results field in the JSON response), the COUNT query is still made.

Is there any way that the code can be modified to avoid a COUNT query here?

Edit: PostgreSQL 9.3, flask-restless 0.14.2

LordSputnik avatar Oct 15 '14 11:10 LordSputnik

Well if the JSON response should include the total number of rows in a paginated response (which it really should, for the sake of making metadata available to the client), it's going to need to make a count at some point, unless there's some solution I'm missing.

On the other hand, when you say "turning pagination off", I assume you mean setting results_per_page to be 0, and in this case, the way the code exists now, it doesn't really "disable" pagination, but it just seems to assume there should be one gigantic "page" containing everything (see the API._paginated() function). Are you suggesting that we should change the behavior of pagination so that if the user specifies results_per_page=0, then no count should be made (perhaps replacing a count with a len() after getting the resulting list)?

jfinkels avatar Oct 22 '14 01:10 jfinkels

I have implemented a partial solution to this problem for the case when pagination is disabled: 6d9ee974434d7b7bb87ba91c005debf65cd5f414.

To summarize, the problem is that we issue a COUNT in order to compute the pagination links (both for the Link headers and for the links element in the JSON API document). One solution is to provide the user with the ability to disable the generation of those pagination links, thereby allowing Flask-Restless to skip the counting step. So I see this now as a feature request: allow the user to disable pagination links in order to allow Flask-Restless to avoid issuing a COUNT.

jfinkels avatar Feb 12 '16 09:02 jfinkels