piccolo_api
piccolo_api copied to clipboard
Implement cursor based pagination in PiccoloCRUD
Currently, only offset based pagination is supported in PiccoloCRUD (using the __page and __page_size query params). There are issues with this, as outlined in this article.
Modify PiccoloCRUD so it accepts a __cursor GET parameter. The value will be the ID of a row for now, but encoded as string. Encoding it as a string gives more flexibility (for example, we may want to return a UUID instead of an integer in the future).
The response should then be something like:
{
"rows": [...],
"next_cursor: "1234"
}
If both the __page and __cursor GET params are passed to an endpoint, an error should be returned.
Related - https://github.com/piccolo-orm/piccolo/issues/47
Hi Daniel
I’ve never encountered cursor pagination before, but it looks interesting to me. I played around with it a bit, and it seems to work if I didn’t miss the point. I changed a bit of code in PiccoloCRUD endpoints.py.
- adding cursor model to
serializers.py
# cursor model
class Cursor(pydantic.BaseModel):
next_cursor: str
- adding cursor to
Paramclasscursor: str = "" - adding cursor to
_split_paramsmethod
if key == "__cursor":
try:
cursor = str(value)
except ValueError:
logger.info(f"Unrecognised __cursor argument - {value}")
else:
response.cursor = cursor
continue
- adding cursor pagination and base64 encoding and decoding for next_cursor value
async def _get_all(self, params: t.Optional[t.Dict[str, t.Any]] = None) -> Response:
# rest of unchanged code
# Pagination
cursor = split_params.cursor
page_size = split_params.page_size or self.page_size
page = split_params.page
# raise error if both __page and __cursor in query parametars
if "__cursor" in params and "__page" in params:
return JSONResponse(
{
"error": "Can't have both __page and __cursor as query parametars",
},
status_code=403,
)
# Cursor pagination
# query where limit is equal to page_size plus one
query = query.limit(page_size + 1)
rows = await query.run()
# initial cursor
next_cursor = self.encode_cursor(str(rows[-1]["id"]))
if cursor:
# query parametar cursor
next_cursor = self.decode_cursor(cursor)
# querying by query parametar order
if order_by and order_by.ascending:
query = query.where(self.table.id >= int(next_cursor)).limit(
page_size + 1
)
else:
query = query.where(self.table.id <= int(next_cursor)).limit(
page_size + 1
)
rows = await query.run()
# reset cursor to new value from latest value from rows
# if no more further results provide empty string as next_cursor
next_cursor = (
""
if len(rows) <= page_size
else self.encode_cursor(str(rows[-1]["id"]))
)
# LimitOffset pagination
# if page_size greater than max_page_size return error
if page_size > self.max_page_size:
return JSONResponse(
{
"error": "The page size limit has been exceeded",
},
status_code=403,
)
query = query.limit(page_size)
if page > 1:
offset = page_size * (page - 1)
query = query.offset(offset).limit(page_size)
rows = await query.limit(page_size).run()
# We need to serialise it ourselves, in case there are datetime
# fields.
cursor_model = Cursor(next_cursor=next_cursor).json()
json = self.pydantic_model_plural(include_readable=include_readable)(
rows=rows
).json()
return (
CustomJSONResponse(f"{json[:-1]}, {cursor_model[1:]}")
if "__cursor" in params
else CustomJSONResponse(json)
)
###########################################################################
def encode_cursor(self, cursor):
cursor_bytes = cursor.encode("ascii")
base64_bytes = base64.b64encode(cursor_bytes)
return base64_bytes.decode("ascii")
def decode_cursor(self, cursor):
base64_bytes = cursor.encode("ascii")
cursor_bytes = base64.b64decode(base64_bytes)
return cursor_bytes.decode("ascii")
Sorry for the long comment, but what do you think about this? If it's OK I can do PR, if it's not, just forget it :). Cheers.
@sinisaos Great job!
I haven't used cursor based pagination before, but I do know it's the recommended way as limit + offset isn't very performant in Postgres, and there are weird edge cases where you can miss rows, or get duplicate rows.
If you make a pull request, I'll have a play around. I'm interested to try it out! Thanks.
@dantownsend Thank you. I'll do it later tonight.
Any updates on the implementation here.:)
@satishdash All progress is at #34, but there has been no activity lately. If you have any code or ideas on how to improve it, feel free to contribute and make pull request.
@satishdash Yeah, it has stalled a bit.
Cursor based pagination is really complex - @sinisaos did a great job of getting it to work, but I still struggle to wrap my head around it entirely! Here is the implementation:
https://github.com/piccolo-orm/piccolo_api/pull/34
The problem is it adds a lot of logic to PiccoloCRUD. What I'd like to do is somehow move this logic into a separate paginator class, like Django REST Framework does:
https://github.com/encode/django-rest-framework/blob/98e56e0327596db352b35fa3b3dc8355dc9bd030/rest_framework/pagination.py#L577
That way we can add the new functionality, whilst keeping PiccoloCRUD maintainable. But I haven't been able to make much progress on it recently.
@dantownsend This is done in separate package. You can close this issue.