pyairtable icon indicating copy to clipboard operation
pyairtable copied to clipboard

Add option to retry rate-limited requests

Open jskrzypek opened this issue 3 years ago • 2 comments
trafficstars

We tend to get a lot of failed airtable requests due to the rate limits, and are using Tenacity to retry these requests automatically, which usually works well... I've created a wrapper function to handle this generically:

from pyairtable import Table
from tenacity import retry, wait_fixed, stop_after_attempt

FOO_TABLE = Table(os.environ.get("AIRTABLE_API_KEY"), BASE_ID, TABLE_NAME)


@retry(wait=wait_fixed(3), stop=stop_after_attempt(3))
def call_airtable(tbl, *args, method="all", **kwargs):
    return getattr(tbl, method)(*args, **kwargs)


def get_foo(foo_id):
    return call_airtable(FOO_TABLE, foo_id, method="get")

While this strategy generally works for us, it wraps the whole pyAirtable function and there are some functions in the pyAirtable API – i.e. iterate(), all(), batch_create(), & batch_update() – where this strategy is suboptimal. It seems to me that it would be much better to just patch the retry functionality into the inner _request() method of the ApiAbstract super class, so only the actual chunk or offset request that was rate-limited gets retried.

@gtalarico, would you be open to a PR implementing retry functionality on the inner API calls? I see you previously looked into this and closed the PR in #93.

For comparison, the official airtable.js JS library has this functionality built in.

In terms of implementation strategy I think it may make the most sense to add something a little generic. I have in mind to add a request_strategy option to ApiAbstract.__init__(), that would allow users to provide an alternative requests.Session-like object that is responsible for making the underlying request to Airtable's API. Then pyAirtable would provide two built-in options, the default, bare-bones requests.Session option, and a session object where the .request() method is wrapped with a retry strategy that retries whenever it receives a 429, and adds exponential backoff with some random jitter just like how airtable.js does it.

As an added benefit this approach might open the door for users to provide an async request library if they want without much effort on your part.

jskrzypek avatar Feb 28 '22 17:02 jskrzypek

Hi @jskrzypek - thanks for the detailed write up. I think this would be a good addition.

I like the request_strategy approach.

If we use tenacity, it would be good if we could add it as an optional dependency.

e.g. pip install pyairtable[tenacity] so user that don't need this level of retry can not use it, or can provide a simpler implementation

gtalarico avatar Feb 28 '22 19:02 gtalarico

@gtalarico I will pick this up again later (i've burned more time on it that I ought to have, TBH), but I've added a draft PR for you to check out and offer feedback on. It contains the implementation of the functional changes to the library and implementation of the RequestStrategy pattern. It's still a WIP, but I wanted to get buy-in on the current implementation before devoting any more time to it.

jskrzypek avatar Mar 01 '22 22:03 jskrzypek

Cleaning up old issues, this was (partially?) resolved in #272. By default, pyAirtable will retry when it receives 429 (max quota reached) but it will not retry 500s. Not all API operations are idempotent (update/delete generally are, but create generally isn't) and a failure during a batch operation might indicate partial success (which means the application might have to re-read the data from the API before it knows how to proceed).

I think it's best for application developers to use their own judgment when crafting more complex retry behavior, but the library now presents a sensible (and safe) default.

mesozoic avatar Sep 22 '23 21:09 mesozoic