fastapi-pagination Performance issues due to offset/limit paradigm on PostgreSQL database

Issue description

This line slows down SQL queries on PostgreSQL 11 significantly as it changes the execution plan.

Steps to reproduce the issue

Create data

create table dummy_table_left (
 	id int primary key,
 	some_text text,
 	some_comment text
 );
 insert into dummy_table_left values 
 (1, 'two', 'three'),
 (2, 'two2', 'three'),
 (3, 'two3', 'three'),
 (4, 'two4', 'three'),
 (5, 'two5', 'three'); 
 
 create table dummy_table_right (
 	id int primary key,
 	some_text text,
 	some_comment text
 );
 insert into dummy_table_right values 
 (1, 'two', 'three2'),
 (2, 'two2', 'three3'),
 (3, 'two3', 'three4'),
 (4, 'two4', 'three5'),
 (5, 'two5', 'three6')

Compare execution plan of query

explain
 select * from dummy_table_left dtl 
 join dummy_table_right dtr on dtl.some_text = dtr.some_text;

vs

explain
 select * from dummy_table_left dtl 
 join dummy_table_right dtr on dtl.some_text = dtr.some_text 
 offset 2
 limit 2;

(or alternatively load more data in the tables and observe the significant increase in runtime by using explain analyze instead)

What's the expected result?

something similar to

Subquery Scan on row_nunber_wrapper  (cost=29.12..284.54 rows=18 width=144)
  Filter: ((row_nunber_wrapper.some_label >= 2) AND (row_nunber_wrapper.some_label <= 4))
  ->  WindowAgg  (cost=29.12..230.36 rows=3612 width=148)
        ->  Hash Join  (cost=29.12..176.18 rows=3612 width=140)
              Hash Cond: (dtl.some_text = dtr.some_text)
              ->  Seq Scan on dummy_table_left dtl  (cost=0.00..18.50 rows=850 width=68)
              ->  Hash  (cost=18.50..18.50 rows=850 width=68)
                    ->  Seq Scan on dummy_table_right dtr  (cost=0.00..18.50 rows=850 width=68)

which can be achieved with

explain 
select * from (select row_number() over (order by 1) as "some_label", * from dummy_table_left dtl 
 join dummy_table_right dtr on dtl.some_text = dtr.some_text 
) row_nunber_wrapper where some_label >= 2 and some_label <= 2 + 2;

What's the actual result?

Limit  (cost=6.05..12.10 rows=2 width=148)
  ->  WindowAgg  (cost=0.00..10930.81 rows=3612 width=148)
        ->  Nested Loop  (cost=0.00..10876.62 rows=3612 width=140)
              Join Filter: (dtl.some_text = dtr.some_text)
              ->  Seq Scan on dummy_table_left dtl  (cost=0.00..18.50 rows=850 width=68)
              ->  Materialize  (cost=0.00..22.75 rows=850 width=68)
                    ->  Seq Scan on dummy_table_right dtr  (cost=0.00..18.50 rows=850 width=68)

Additional details / screenshot

A nested loop on a join is significantly slower than the hash join. Adding the limit and offset impacts the actual execution plan, so this pagination technique cannot be used with constructed / joined queries (on postgres at least).
Presaving the result into a table would create an IO bottleneck.
In my use case, the proposed approach with row_number() takes 300ms compared to 30s with the implemented approach in this module.

Oct 22 '21 09:10 alexdyck

@alexdyck Thanks, will take a closer look and will try to implement approach as you described

Nov 10 '21 13:11 uriyyo

@alexdyck For those who have a performance issue with the default limit-offset implementation can now use row-number pagination.

row-number pagination available for sqlmodel and sqlalchemy extenations.

Usage example:

@app.get("/users", response_model=Page[UserSchema])
def route(db: Session = Depends(get_db)):
    return paginate(db, select(User), query_type="row-number")

I am planing to release a new version in 1-2 weeks.

Sep 09 '22 16:09 uriyyo

Just to add something into the topic of the LIMIT-OFFSET on Postgres. The Postgres documentation describes what the @alexdyck is doing wrong

Thus, using different LIMIT/OFFSET values to select different subsets of a query result will give inconsistent results unless you enforce a predictable result ordering with ORDER BY

you don't enforce any ordering this is the problem. Adding ORDER BY id will for sure make the query almost instant because DB will just use pk index to pick requested rows

Sep 17 '22 22:09 dexax

@dexax Thanks for the explanation.

I am closing this issue as far as we have found the root cause, and it's not a library issue.

Sep 18 '22 09:09 uriyyo