dlt icon indicating copy to clipboard operation
dlt copied to clipboard

Add 'retry' response action for generic rest api source

Open alex-fedorenk0 opened this issue 1 year ago • 4 comments

Feature description

Extend the default response actions with request retry based on response content.

Are you a dlt user?

I'd consider using dlt, but it's lacking a feature I need.

Use case

I am considering dlt with declarative rest api source as a replacement for custom ingestion script. The API I am using can return incomplete pages of data having response status 200, but 'error' key in response content. Currently this leads to missing data at destination. In my case incremental cursor value is read from last page so previous incomplete pages are never updated.

Proposed solution

Add 'retry' response action based on response content which will trigger retry of rest api request. The error key can be looked up as content substring, but it would be great to have an ability to check for actual keys. As an example, in my case it is "status":"failure" and "error":{"message": ..., "details": ...}. I understand that I use custom Client instance for this use case, but it seems a good addition to declarative source.

Related issues

No response

alex-fedorenk0 avatar Sep 04 '24 16:09 alex-fedorenk0

Hey @alex-fedorenk0, thanks for suggestion. How does the API you're calling work on the retries? Is it returning a complete page on a subsequent call?

burnash avatar Sep 10 '24 13:09 burnash

It may return the complete page on the next attempt, or fail again depending on server load. Idk if this behavior (returning http 200 with error key and incomplete data) is common enough. In the custom client we raise tenacity's TryAgain exception if 'error' key is present in response content or total record count is not equal to expected count. For us it is better to fail the extraction completely and keep the existing state until the next schedule run then to get incomplete pages of data.

alex-fedorenk0 avatar Sep 10 '24 15:09 alex-fedorenk0

It appears that the current version (1.1.0) allows usage of custom callable in response action configuration. So, we can add custom retry functionality by passing the hook which checks response for error key and raises TryAgain exception if 'error' is present.

The same approach can be used to retry when incomplete json payload is returned (by default an exception is thrown)

"response_actions": [
    {
        'status_code': 200,
        'content': 'error',
        'action': [retry_on_json_error,  retry_on_content_error]
    }
]
def retry_on_json_error(response: Response):
    try:
        _ = response.json()
    except JSONDecodeError:
        logger.warning("Invalid JSON")
        raise TryAgain


def retry_on_content_error(response: Response, **kwargs):
    if error := response.json().get("error"):
        logger.warning(error["message"])
        raise TryAgain

alex-fedorenk0 avatar Sep 27 '24 13:09 alex-fedorenk0

Hey @alex-fedorenk0, sorry for the delayed response. Yes, I agree that having rest_api handle the exception in a way similar to IgnoreResponseException is the way to go. This exception would need to be handled in RESTClient:

https://github.com/dlt-hub/dlt/blob/a76a06d9a8032e4bcb64739ce0384b35dbab0977/dlt/sources/helpers/rest_client/client.py#L216-L220

Would you be interested on working on a PR for this feature?

burnash avatar Oct 02 '24 09:10 burnash