piccolo_api icon indicating copy to clipboard operation
piccolo_api copied to clipboard

Implement cursor based pagination in PiccoloCRUD

Open dantownsend opened this issue 4 years ago • 8 comments
trafficstars

Currently, only offset based pagination is supported in PiccoloCRUD (using the __page and __page_size query params). There are issues with this, as outlined in this article.

Modify PiccoloCRUD so it accepts a __cursor GET parameter. The value will be the ID of a row for now, but encoded as string. Encoding it as a string gives more flexibility (for example, we may want to return a UUID instead of an integer in the future).

The response should then be something like:

{
    "rows": [...],
    "next_cursor: "1234"
}

If both the __page and __cursor GET params are passed to an endpoint, an error should be returned.

dantownsend avatar Jan 21 '21 12:01 dantownsend

Related - https://github.com/piccolo-orm/piccolo/issues/47

dantownsend avatar Jan 21 '21 12:01 dantownsend

Hi Daniel I’ve never encountered cursor pagination before, but it looks interesting to me. I played around with it a bit, and it seems to work if I didn’t miss the point. I changed a bit of code in PiccoloCRUD endpoints.py.

  1. adding cursor model to serializers.py
# cursor model
class Cursor(pydantic.BaseModel):
    next_cursor: str
  1. adding cursor to Param class cursor: str = ""
  2. adding cursor to _split_params method
if key == "__cursor":
    try:
        cursor = str(value)
    except ValueError:
        logger.info(f"Unrecognised __cursor argument - {value}")
    else:
        response.cursor = cursor
    continue
  1. adding cursor pagination and base64 encoding and decoding for next_cursor value
async def _get_all(self, params: t.Optional[t.Dict[str, t.Any]] = None) -> Response:

    # rest of unchanged code

    # Pagination
    cursor = split_params.cursor
    page_size = split_params.page_size or self.page_size
    page = split_params.page

    # raise error if both __page and __cursor in query parametars
    if "__cursor" in params and "__page" in params:
        return JSONResponse(
            {
                "error": "Can't have both __page and __cursor as query parametars",
            },
            status_code=403,
        )

    # Cursor pagination

    # query where limit is equal to page_size plus one
    query = query.limit(page_size + 1)
    rows = await query.run()
    # initial cursor
    next_cursor = self.encode_cursor(str(rows[-1]["id"]))
    if cursor:
        # query parametar cursor
        next_cursor = self.decode_cursor(cursor)
        # querying by query parametar order
        if order_by and order_by.ascending:
            query = query.where(self.table.id >= int(next_cursor)).limit(
                page_size + 1
            )
        else:
            query = query.where(self.table.id <= int(next_cursor)).limit(
                page_size + 1
            )
        rows = await query.run()
        # reset cursor to new value from latest value from rows
        # if no more further results provide empty string as next_cursor
        next_cursor = (
            ""
            if len(rows) <= page_size
            else self.encode_cursor(str(rows[-1]["id"]))
        )

    # LimitOffset pagination

    # if page_size greater than max_page_size return error
    if page_size > self.max_page_size:
        return JSONResponse(
            {
                "error": "The page size limit has been exceeded",
            },
            status_code=403,
        )
    query = query.limit(page_size)
    if page > 1:
        offset = page_size * (page - 1)
        query = query.offset(offset).limit(page_size)
    rows = await query.limit(page_size).run()
    # We need to serialise it ourselves, in case there are datetime
    # fields.
    cursor_model = Cursor(next_cursor=next_cursor).json()
    json = self.pydantic_model_plural(include_readable=include_readable)(
        rows=rows
    ).json()
    return (
        CustomJSONResponse(f"{json[:-1]}, {cursor_model[1:]}")
        if "__cursor" in params
        else CustomJSONResponse(json)
    )

###########################################################################

def encode_cursor(self, cursor):
    cursor_bytes = cursor.encode("ascii")
    base64_bytes = base64.b64encode(cursor_bytes)
    return base64_bytes.decode("ascii")

def decode_cursor(self, cursor):
    base64_bytes = cursor.encode("ascii")
    cursor_bytes = base64.b64decode(base64_bytes)
    return cursor_bytes.decode("ascii")

Sorry for the long comment, but what do you think about this? If it's OK I can do PR, if it's not, just forget it :). Cheers.

sinisaos avatar Feb 04 '21 07:02 sinisaos

@sinisaos Great job!

I haven't used cursor based pagination before, but I do know it's the recommended way as limit + offset isn't very performant in Postgres, and there are weird edge cases where you can miss rows, or get duplicate rows.

If you make a pull request, I'll have a play around. I'm interested to try it out! Thanks.

dantownsend avatar Feb 04 '21 14:02 dantownsend

@dantownsend Thank you. I'll do it later tonight.

sinisaos avatar Feb 04 '21 16:02 sinisaos

Any updates on the implementation here.:)

satishdash avatar Aug 05 '21 15:08 satishdash

@satishdash All progress is at #34, but there has been no activity lately. If you have any code or ideas on how to improve it, feel free to contribute and make pull request.

sinisaos avatar Aug 05 '21 18:08 sinisaos

@satishdash Yeah, it has stalled a bit.

Cursor based pagination is really complex - @sinisaos did a great job of getting it to work, but I still struggle to wrap my head around it entirely! Here is the implementation:

https://github.com/piccolo-orm/piccolo_api/pull/34

The problem is it adds a lot of logic to PiccoloCRUD. What I'd like to do is somehow move this logic into a separate paginator class, like Django REST Framework does:

https://github.com/encode/django-rest-framework/blob/98e56e0327596db352b35fa3b3dc8355dc9bd030/rest_framework/pagination.py#L577

That way we can add the new functionality, whilst keeping PiccoloCRUD maintainable. But I haven't been able to make much progress on it recently.

dantownsend avatar Aug 05 '21 19:08 dantownsend

@dantownsend This is done in separate package. You can close this issue.

sinisaos avatar Sep 22 '22 06:09 sinisaos