dlt icon indicating copy to clipboard operation
dlt copied to clipboard

[experiment] Add resource time limit and rate limiting

Open sh-rp opened this issue 1 year ago • 1 comments

Description

This PR extends the add_limit function to add time limits and rate limits to resources. This approach is to be discussed, but very straightforward, easy to test and works for both sync and async resources.

TODOs (if we go this route):

  • More tests that test pipe iterators with multiple resources (we want to allow a global rate limit for APIs for example)
  • Docs
  • Extend source level add_limit to have the same functionality as the resource level one
  • Improve the logger warning if add_limit is declared on non-incremental resources.
  • Investigate fifo extractor strategy, maybe do not go to round robin if none is yielded...

Other thoughts:

  • We might want to apply the rate limit wait also once before the original generator is used plus allow rate limiting on the transformers, otherwise global rate limiting for APIs will not work.

sh-rp avatar Jun 18 '24 09:06 sh-rp

Deploy Preview for dlt-hub-docs canceled.

Name Link
Latest commit 5fc23245c67e3654dee8ddd588a9109c72d292ba
Latest deploy log https://app.netlify.com/sites/dlt-hub-docs/deploys/6734869165314d0008ffccee

netlify[bot] avatar Jun 18 '24 09:06 netlify[bot]

  • We might want to apply the rate limit wait also once before the original generator is used plus allow rate limiting on the transformers, otherwise global rate limiting for APIs will not work.

🎉

I like where this is going. What are your thoughts on how this could be used in conjunction with headers returned from API endpoints? I have this API for example: https://developer.affinity.co/#section/Getting-Started/Rate-Limits

which returns headers in each response: Screenshot 2024-11-28 at 4 50 04 pm

It basically allows you to adjust your requests based on a api key/user and an org-wide limit. The current rest client supports 429 responses, but my plan is to possibly preempt them as much as I can, so using these headers dynamically (e.g. update the throttling behavior after each response) would be my goal.

joscha avatar Nov 28 '24 16:11 joscha

Closing this PR in favor of:

  • https://github.com/dlt-hub/dlt/pull/2149
  • https://github.com/dlt-hub/dlt/pull/2131

@joscha the api rate limiting things you have suggested would be a layer above in the rest_api implementation. There might already be a ticket for this or you could open a new one.

sh-rp avatar Dec 15 '24 19:12 sh-rp

would be a layer above in the rest_api implementation

Okay. How so, if it relies on a 429 answer or an explicit rest API resource returning the limits? Would you assume each resource to have some sort of hook to report back any time limits?

joscha avatar Dec 15 '24 19:12 joscha