dlt icon indicating copy to clipboard operation
dlt copied to clipboard

Additional paginator for offset without a "total_param"

Open Thommel-nl opened this issue 1 year ago • 2 comments

Feature description

There are some API's that do use an offset and limit pagination type, but do not respond with a total_param. My proposed solution adds a paginator option that checks whether the records list in the response is empty. If that's the case then the loop stops and data is further handled (normalization and loading).

Are you a dlt user?

Yes, I'm already a dlt user.

Use case

Adds a pagination option for offset and limit pagination types which do not have a total_param in the response.

Proposed solution

class OffsetNoTotalPaginator(BasePaginator):
    def __init__(
            self,
            limit: int,
            offset: int = 0,
            offset_param: str = "offset",
            limit_param: str = "limit",
            list_param: str = "records"
    ) -> None:
        super().__init__()
        self.limit_param = limit_param
        self.limit = limit
        self.offset_param = offset_param
        self.offset = offset
        self.list_param = list_param
        """
        Args:
            limit (int): The maximum number of items to retrieve
                in each request.
            offset (int): The offset for the first request.
                Defaults to 0.
            offset_param (str): The query parameter name for the offset.
                Defaults to 'offset'.
            limit_param (str): The query parameter name for the limit.
                Defaults to 'limit'.
            list_param (str): The param that contains the list of results.
                Defaults to 'records'
        """

    def init_request(
            self,
            request: Request
        ) -> None:
        self.update_request(request)

    def update_state(
        self,
        response: Response
    ) -> None:
        # Assumes that the API returns an empty list when no more data is available
        if not response.json()[self.list_param]:
            self._has_next_page = False
        else:
            self.offset += self.limit

    def update_request(
            self,
            request: Request
        ) -> None:
        request.params[self.offset_param] = self.offset
        request.params[self.limit_param] = self.limit

Related issues

No response

Thommel-nl avatar Jul 01 '24 08:07 Thommel-nl

Would be a food feature.

Spitballing on the implementation.
Imo this should use the page data list which is already resolved by the client before calling paginator.update_state. Maybe paginator's should always have access to it, i.e. we change the signature of update_state to:

def update_state(self, response: Response, page_data: List[Any]):

Existing OffsetPaginator can be extended instead of making this a separate class:

  1. Make the total_path argument optional
  2. When total_path is missing, fall back to checking whether page_data is empty

steinitzu avatar Jul 05 '24 17:07 steinitzu

This functionality has been requested again https://github.com/dlt-hub/dlt/issues/1637 after an internal discussion we decided to proceed with implementing it in #1677

burnash avatar Aug 09 '24 12:08 burnash