fastapi-pagination icon indicating copy to clipboard operation
fastapi-pagination copied to clipboard

`Total` returns more values than expected (not applying distinct on count)

Open Riya-900 opened this issue 1 year ago • 18 comments

Hi,

I have the tables Author and Books.

Author Book
Alice A
Alice B
Bob C

In my FastAPI endpoint, when paginating the select(Author).join(Books) I want to respond with a list that has 2 dicts:

  • Alice (with her two books)
  • Bob (with his one book)

It works great but the total is off. Instead of counting the unique Authors, it counts the total rows of the query.

Is there an option to fix that?

Perhaps relevant:

  • using FastAPI (+pydantic)
  • async PostgreSQL

Riya-900 avatar May 17 '23 12:05 Riya-900

Hi @Riya-900,

Could you please change your query to smth like this:

select(Author).options(selectinload(Author.books))

uriyyo avatar May 17 '23 12:05 uriyyo

@Riya-900 Hello! Have you resolved this issue? If yes, then how did you do it?

bnku avatar Jul 06 '23 14:07 bnku

I passed the total_group_by parameter to the count_query function and called query.group_by(total_group_by) to ensure the correct calculation of total:

# file: fastapi_pagination/ext/sqlalchemy.py

def count_query(query: Select, *, use_subquery: bool = True, total_group_by=None) -> Select:
    query = query.order_by(None).options(noload("*"))

    if total_group_by:
        query = query.group_by(total_group_by)

    if use_subquery:
        return select(func.count()).select_from(query.subquery())

    return query.with_only_columns(  # noqa: PIE804
        func.count(),
        **{"maintain_column_froms": True},
    )

@uriyyo, perhaps this can help you improve your product.

bnku avatar Jul 06 '23 15:07 bnku

@bnku I guess the issue is with how you load relationships. I am not sure that we need to use group_by to get count.

uriyyo avatar Jul 06 '23 19:07 uriyyo

@Riya-900 Any updates, does this comment help you?

https://github.com/uriyyo/fastapi-pagination/issues/674#issuecomment-1551275725

uriyyo avatar Jul 14 '23 11:07 uriyyo

@uriyyo In my use case, I really need to use .join(). How can I fix this issue?

lqmanh avatar Aug 04 '23 06:08 lqmanh

Hi @lqmanh,

Could you please show an example?

uriyyo avatar Aug 04 '23 10:08 uriyyo

@uriyyo Sorry for my late reply, but I'm doing some kind of 3-way joining like this:

class Status(DeclarativeBase):
    group_id: Mapped[UUID]

class Item(DeclarativeBase):
    status: Mapped[Status] = relationship(...)
    documents: Mapped[list[Document]] = relationship(...)

class Document(DeclarativeBase):
    item_id: Mapped[UUID]
    status_group_id: Mapped[UUID]


stmt = (
    select(Item)
    .join(Item.status)
    .join(Item.documents)
    .where(Status.group_id == Document.status_group_id)  # only include documents of current status group
    .options(
        contains_eager(Item.status),
        contains_eager(Item.documents),
    )
)

UPDATED: Can fastapi-pagination support distinct count, by passing column(s) to paginate()? paginate(..., count_distinct=Item.id) for example.

lqmanh avatar Aug 15 '23 12:08 lqmanh

@lqmanh Could you try to change your query to:

stmt = (
    select(Item)
    .join(Item.status)
    .join(Item.documents)
    .options(
        contains_eager(Item.status),
        selectinload(Item.documents),  # use selectinload() to load all documents
    )
)

uriyyo avatar Aug 16 '23 11:08 uriyyo

@uriyyo ~Then how do I filter only documents associated with current status group?~ Ah sorry, misread your reply. Using selectinload or joinedload independently won't work anyway

lqmanh avatar Aug 16 '23 12:08 lqmanh

@lqmanh If you have your relationship configured correctly, it will be done automatically

uriyyo avatar Aug 16 '23 12:08 uriyyo

@lqmanh Could you please config like this:

from __future__ import annotations

from operator import and_
from typing import Any
from uuid import UUID, uuid4

from sqlalchemy import ForeignKey, create_engine, select
from sqlalchemy.orm import (
    DeclarativeBase,
    Mapped,
    joinedload,
    mapped_column,
    relationship,
    sessionmaker,
)

from fastapi_pagination import Page, Params, set_page
from fastapi_pagination.ext.sqlalchemy import paginate

engine = create_engine("sqlite:///:memory:", echo=True)


class Base(DeclarativeBase):
    pass


class Status(Base):
    __tablename__ = "statuses"

    id: Mapped[int] = mapped_column(primary_key=True, autoincrement=True)

    item_id: Mapped[UUID] = mapped_column(ForeignKey("items.id"))

    group_id: Mapped[UUID] = mapped_column()

    item_documents: Mapped[list[Document]] = relationship(
        primaryjoin=lambda: and_(
            Document.item_id == Status.item_id,
            Document.status_group_id == Status.group_id,
        ),
        foreign_keys=[item_id, group_id],
        viewonly=True,
        uselist=True,
    )


class Item(Base):
    __tablename__ = "items"

    id: Mapped[int] = mapped_column(primary_key=True, autoincrement=True)

    status: Mapped[Status] = relationship()
    documents: Mapped[list[Document]] = relationship()


class Document(Base):
    __tablename__ = "documents"

    id: Mapped[int] = mapped_column(primary_key=True, autoincrement=True)

    item_id: Mapped[UUID] = mapped_column(ForeignKey("items.id"))
    status_group_id: Mapped[UUID]


Base.metadata.create_all(engine)

Session = sessionmaker(bind=engine)

with Session() as session:
    group_id_1 = uuid4()
    group_id_2 = uuid4()

    session.add_all([
        Item(
            status=Status(
                group_id=group_id_1,
            ),
            documents=[
                Document(status_group_id=group_id_1),
                Document(status_group_id=group_id_1),
                Document(status_group_id=group_id_2),
            ],
        ),
        Item(
            status=Status(
                group_id=group_id_2,
            ),
            documents=[
                Document(status_group_id=group_id_1),
                Document(status_group_id=group_id_2),
                Document(status_group_id=group_id_2),
            ],
        ),
    ])
    session.commit()

with set_page(Page[Any]), Session() as session:
    stmt = (
        select(Item)
        .options(joinedload(Item.status).joinedload(Status.item_documents))
    )

    page = paginate(session, stmt, Params())

    print(page)

uriyyo avatar Aug 16 '23 12:08 uriyyo

@lqmanh Or you can try this config:

from __future__ import annotations

from operator import and_
from typing import Any
from uuid import UUID, uuid4

from sqlalchemy import ForeignKey, create_engine, select
from sqlalchemy.orm import (
    DeclarativeBase,
    Mapped,
    joinedload,
    mapped_column,
    relationship,
    sessionmaker,
)

from fastapi_pagination import Page, Params, set_page
from fastapi_pagination.ext.sqlalchemy import paginate

engine = create_engine("sqlite:///:memory:", echo=True)


class Base(DeclarativeBase):
    pass


class Status(Base):
    __tablename__ = "statuses"

    id: Mapped[int] = mapped_column(primary_key=True, autoincrement=True)

    item_id: Mapped[UUID] = mapped_column(ForeignKey("items.id"))

    group_id: Mapped[UUID] = mapped_column()


class Item(Base):
    __tablename__ = "items"

    id: Mapped[int] = mapped_column(primary_key=True, autoincrement=True)

    status: Mapped[Status] = relationship()
    documents: Mapped[list[Document]] = relationship()

    status_documents: Mapped[list[Document]] = relationship(
        secondary=Status.__table__,
        primaryjoin=lambda: Status.item_id == Item.id,
        secondaryjoin=lambda: and_(
            Document.item_id == Status.item_id,
            Document.status_group_id == Status.group_id,
        ),
        viewonly=True,
    )


class Document(Base):
    __tablename__ = "documents"

    id: Mapped[int] = mapped_column(primary_key=True, autoincrement=True)

    item_id: Mapped[UUID] = mapped_column(ForeignKey("items.id"))
    status_group_id: Mapped[UUID]


Base.metadata.create_all(engine)

Session = sessionmaker(bind=engine)

with Session() as session:
    group_id_1 = uuid4()
    group_id_2 = uuid4()

    session.add_all([
        Item(
            status=Status(
                group_id=group_id_1,
            ),
            documents=[
                Document(status_group_id=group_id_1),
                Document(status_group_id=group_id_1),
                Document(status_group_id=group_id_2),
            ],
        ),
        Item(
            status=Status(
                group_id=group_id_2,
            ),
            documents=[
                Document(status_group_id=group_id_1),
                Document(status_group_id=group_id_2),
                Document(status_group_id=group_id_2),
            ],
        ),
    ])
    session.commit()

with set_page(Page[Any]), Session() as session:
    stmt = (
        select(Item)
        .options(
            joinedload(Item.status),
            joinedload(Item.status_documents),
        )
    )

    page = paginate(session, stmt, Params())

    print(page)

uriyyo avatar Aug 16 '23 12:08 uriyyo

@uriyyo Looks nice, but it won't work without setting Status.item_id, which isn't true because Status-Item is one-to-many

lqmanh avatar Aug 17 '23 17:08 lqmanh

Actually, I managed to solve this issue by applying distinct() to count_query():

def count_query(query: Select, *, use_subquery: bool = True) -> Select:
    query = query.order_by(None).options(noload("*"))

    if use_subquery:
        return select(func.count()).select_from(query.distinct().subquery())

    return query.distinct().with_only_columns(  # noqa: PIE804
        func.count(),
        **{"maintain_column_froms": True},
    )

Do you think it works for all cases?

lqmanh avatar Aug 17 '23 18:08 lqmanh

Hi all,

Sorry for the long response.

Unfortunately, we can't fix this issue using distinct count cause it will break the way pagination works. You need to play around with the way you fetch relationships, it should help to solve this issue.

uriyyo avatar Nov 25 '23 12:11 uriyyo