dlt icon indicating copy to clipboard operation
dlt copied to clipboard

[rest_api source] can't detect pagination for github api and poke api

Open AstrakhantsevaAA opened this issue 1 year ago • 2 comments

dlt version

1.1.0

Describe the problem

The rest_api source cannot autodetect pagination for github api and poke api, this impacts our tutorial.

If you run this pipeline, you will get a list of Fallback paginator warnings for both: github api and poke api.

This fallback also causes a rate limiting error, the rest_api source continually requests github api until the error occurs.

Expected behavior

according to our tutorial, rest_api source should automatically detect such simple types of pagination.

Steps to reproduce

  1. run
dlt init rest_api duckdb
  1. run
python rest_api_pipeline.py

Operating system

macOS

Runtime environment

Local

Python version

3.11

dlt data source

No response

dlt destination

No response

Other deployment details

No response

Additional information

No response

AstrakhantsevaAA avatar Oct 02 '24 10:10 AstrakhantsevaAA

Thank your for the issue @AstrakhantsevaAA, I believe the rest_api detects the paginator successfully. If I'm not mistaken the message is related to "child" resources (single page) where there's no pagination present. In this case paginator uses SinglePagePaginator. Do you see any data loaded when you running the pipelines?

burnash avatar Oct 02 '24 12:10 burnash

@burnash yeah, I think you are right, it's not clear from the warning message. Anyway this part of tutorial should be adjusted, by default we can't run this pipeline, because of rate limits, I think we can reduce the amount of data for issues endpoints:

"initial_value": pendulum.today().subtract(days=**7**).to_iso8601_string(),

And these warning scares our new users :D can we log this warning in the beginning not for each request?

AstrakhantsevaAA avatar Oct 02 '24 16:10 AstrakhantsevaAA