Additional paginator for offset without a "total_param"
Feature description
There are some API's that do use an offset and limit pagination type, but do not respond with a total_param. My proposed solution adds a paginator option that checks whether the records list in the response is empty. If that's the case then the loop stops and data is further handled (normalization and loading).
Are you a dlt user?
Yes, I'm already a dlt user.
Use case
Adds a pagination option for offset and limit pagination types which do not have a total_param in the response.
Proposed solution
class OffsetNoTotalPaginator(BasePaginator):
def __init__(
self,
limit: int,
offset: int = 0,
offset_param: str = "offset",
limit_param: str = "limit",
list_param: str = "records"
) -> None:
super().__init__()
self.limit_param = limit_param
self.limit = limit
self.offset_param = offset_param
self.offset = offset
self.list_param = list_param
"""
Args:
limit (int): The maximum number of items to retrieve
in each request.
offset (int): The offset for the first request.
Defaults to 0.
offset_param (str): The query parameter name for the offset.
Defaults to 'offset'.
limit_param (str): The query parameter name for the limit.
Defaults to 'limit'.
list_param (str): The param that contains the list of results.
Defaults to 'records'
"""
def init_request(
self,
request: Request
) -> None:
self.update_request(request)
def update_state(
self,
response: Response
) -> None:
# Assumes that the API returns an empty list when no more data is available
if not response.json()[self.list_param]:
self._has_next_page = False
else:
self.offset += self.limit
def update_request(
self,
request: Request
) -> None:
request.params[self.offset_param] = self.offset
request.params[self.limit_param] = self.limit
Related issues
No response
Would be a food feature.
Spitballing on the implementation.
Imo this should use the page data list which is already resolved by the client before calling paginator.update_state. Maybe paginator's should always have access to it, i.e. we change the signature of update_state to:
def update_state(self, response: Response, page_data: List[Any]):
Existing OffsetPaginator can be extended instead of making this a separate class:
- Make the
total_pathargument optional - When
total_pathis missing, fall back to checking whetherpage_datais empty
This functionality has been requested again https://github.com/dlt-hub/dlt/issues/1637 after an internal discussion we decided to proceed with implementing it in #1677