dlt icon indicating copy to clipboard operation
dlt copied to clipboard

Update RESTClient range paginators to support stopping after empty pages

Open timoguin opened this issue 1 year ago • 3 comments

Feature description

Update RESTClient's PageNumberPaginator to optionally stop paging after receiving empty pages.

I'm unsure if other paginators need this same functionality. Perhaps it needs to be added to the base RangePaginator class.

Are you a dlt user?

Yes, I'm already a dlt user.

Use case

I am pulling data from a few REST APIs that do not include the information necessary to determine the total number of records or pages. Setting the max number of pages isn't sufficient because the REST helper will keep sending requests up to the max page limit, even if everything past the first page is empty.

Proposed solution

Add a boolean keyword argument to the PageNumberPaginator (or its base class RangePaginator) called stop_after_empty that is False by default.

When it is true, consider the paging complete when the first empty page is received.

Related issues

No response

timoguin avatar Jul 24 '24 21:07 timoguin

Thank you for your feature request @timoguin! In the specification, you wrote that the default for this new flag should be False. I wonder whether defaulting to True would be more practical.

Have you encountered a case where a page delivers empty results but the next page is not empty?

What do you think @burnash?

If you like, please review this implementation: https://github.com/dlt-hub/dlt/pull/1677

willi-mueller avatar Aug 09 '24 10:08 willi-mueller

Agree with @willi-mueller, practically it makes sense to stop after an empty page by default. I'm curious about your experience @timoguin

burnash avatar Aug 09 '24 13:08 burnash

@burnash @willi-mueller I only proposed setting it to False to retain the current default behavior, but I agree that it makes sense to stop after empty pages by default.

timoguin avatar Aug 09 '24 15:08 timoguin