paws icon indicating copy to clipboard operation
paws copied to clipboard

Native retries [question]

Open aleaficionado opened this issue 1 year ago • 6 comments

Hi folks,

Have you had any thoughts about implementing the retry capability from #406 ? I know sometime we do not want to retry, but given the option of "no retries" or "some retries even though they will definitely yield the same result", the latter is more attractive from a code maintenance perspective.

Cheers Adi

aleaficionado avatar Aug 11 '22 11:08 aleaficionado

Hi @aleaficionado, havent had much time to implement a retry just yet. I am just working through the backlog. If you have time I am open to PR.

In the meantime here is the retry I use for the s3fs package:

retry_api_call = function(expr, retries){
  if(retries == 0){
    return(eval.parent(substitute(expr)))
  }

  for (i in seq_len(retries + 1)){
    tryCatch({
      return(eval.parent(substitute(expr)))
    }, http_500 = function(err) {
      # https://docs.aws.amazon.com/AmazonS3/latest/API/ErrorResponses.html
      # HTTP Status Code: 500
      #     Error Code: InternalError
      #     Description: An internal error occurred. Try again.
      #
      # HTTP Status Code: 501
      #     Error Code: NotImplemented
      #     Description: A header that you provided implies functionality that is not implemented.
      #
      # HTTP Status Code: 503
      #     Error Code: ServiceUnavailable
      #     Description: Reduce your request rate.
      #     Error Code: SlowDown
      #     Description: Reduce your request rate.
      #     Error Code: Busy
      #     Description: The service is unavailable. Try again later.
      #
      # HTTP Status Code: 503
      #     Error Code: NotImplemented
      #     Description: Reduce your request rate.
      if(i == (retries + 1))
        stop(err)
      time = 2**i * 0.1
      message(sprintf("Request failed. Retrying in %s seconds...", time))
      Sys.sleep(time)
    }, error = function(err) {
      stop(err)
    })
  }
}

DyfanJones avatar Aug 11 '22 12:08 DyfanJones

In #406 I added a patch (apologies I'm not up-to-speed on using GitHub) where I changed the paws.common to use HTTR's retry capability (changed httr::VERB to httr::RETRY and used times parameter defaulted to 1).

Would that not be suitable?

aleaficionado avatar Aug 11 '22 13:08 aleaficionado

Sorry not sure if the above was clear, but the patch should allow all requests (across all paws packages) to retry w/o additional coding logic (other than setting the config param).

If HTTR ever gets enhanced (any conditions about retry logic really belong in that package) then those enhancements will be used when the dependency gets upgraded.

aleaficionado avatar Aug 11 '22 13:08 aleaficionado

Hey, sorry for the delay on this, I have had to step back from development of paws since this came up originally.

The issue with using RETRY by default is if an operation is not retryable -- invalid request, permission denied, etc. -- the consequences of retrying may be bad, e.g. inadvertently trigger throttling or lockouts. So we wouldn't want that to be the default. Somewhere in their API definitions is whether something is retryable and that's what I think we need to use to retry.

davidkretch avatar Aug 11 '22 23:08 davidkretch

No problem. Hope all is well.

The only throttle I've had with AWS was when writing hundreds of thousands of files at the same time to S3 from one of their other services. However, that is just a throttle of the operations (seconds) rather than a permanent throttle on the account or the use of service.

As far as I am aware, AWS (or most cloud providers) do not have lockouts to prevent attacks. You cannot even set a lockout policy on a password (let alone keys) even if you try.

The issue with using RETRY by default is if an operation is not retryable -- invalid request, permission denied, etc. -- the consequences of retrying may be bad, e.g. inadvertently trigger throttling or lockouts. So we wouldn't want that to be the default. Somewhere in their API definitions is whether something is retryable and that's what I think we need to use to retry.

Regardless, if the default number of tries passed to HTTR is set to one (i.e. no retry), then the existing behaviour is kept so that should alleviate concern about the above two items.

AWS do define a retry policy for custom APIs (https://docs.aws.amazon.com/general/latest/gr/api-retries.html) at code level rather than API level. I think this should be standard for all APIs (or even HTTP requests) so it really should be implemented in the RETRY function of HTTR rather than individual API packages.

aleaficionado avatar Aug 12 '22 06:08 aleaficionado

Notes:

  1. AWS documentation states to retry on server (5xx) errors and throttling errors. Do not retry client (4xx) errors.
  2. Python retry docs: https://github.com/boto/boto3/blob/0b82bf9843ad6d350b48442c47f4a484a886ee3f/docs/source/guide/retries.rst. Uses both HTTP status code and error/exception.
  3. Go retry: https://github.com/aws/aws-sdk-go/blob/b895bbc5c979835963726f6db80db7b5894b2da1/aws/request/retryer.go. Uses both HTTP status code and error/exception. Tests: https://github.com/aws/aws-sdk-go/blob/main/aws/request/retryer_test.go
  4. Probably not helpful: Get the list of expected errors/exceptions for an operation from json path operations->OperationName->errors->ExceptionName, e.g.
  "operations": {
    "AssociateCreatedArtifact": {
      "name": "AssociateCreatedArtifact",
      ...
      "errors": [
        {
          "shape": "AccessDeniedException"
        },
  1. Get the error/exception from an operation response with tbd.
  2. Check EC2 throttling in integration tests.
  3. Open question: what is the difference in effect between status codes and exceptions? What status codes do exceptions come with?

davidkretch avatar Aug 17 '22 03:08 davidkretch

Would be super nice to have this, and if it's just for 5xx errors, it would make a lot of sense to build natively into paws.

When this does become available, what will be the retry settings and default values (number of retries, timeout, etc.)? And will these become new config elements in e.g. paws.storage::s3()?

wlandau avatar Aug 24 '23 14:08 wlandau

googleAuthR supports global options, i.e. getOption("googleAuthR.tryAttempts"). I like this kind of interface for retries.

wlandau avatar Aug 24 '23 20:08 wlandau

Hi @wlandau I will raise this up in my priority list :) (should really write that down :P). With the up and coming paginate functions we has some basic retry logic https://github.com/paws-r/paws/blob/main/paws.common/R/paginate.R (only retry throttling). However I am keen to get a more long term implementation in place.

Currently I'm not 100% sure where architecturally it will sit in paws. I guess we will try to align as closely to the other sdks.

DyfanJones avatar Aug 24 '23 20:08 DyfanJones

I believe I will need to dig into the resource @davidkretch highlighted above, as it doesn't seem as straight forward as retrying http 5xx errors (https://docs.aws.amazon.com/sdkref/latest/guide/feature-retry-behavior.html).

Transient errors (HTTP status codes 400, 408, 500, 502, 503, and 504) and throttling errors (HTTP status codes 400, 403, 429, 502, 503, and 509) can all potentially be retried. SDK retry behavior is determined in combination with error codes or other data from the service.

DyfanJones avatar Aug 24 '23 20:08 DyfanJones

I agree with most of those codes, but 400 is going to be tough on my end. In {targets}, I need to be able to check if an object exists. If it genuinely doesn't exist, I do not want to spend any time retrying.

wlandau avatar Aug 24 '23 21:08 wlandau

Aye http 400 is going to be a pain. I will take sometime looking at how to implement this after I finish the paws.common 0.6.0 cran release.

Side note (you probably have already implemented): If you are checking if an object exists in S3 then I believe http 404 is the correct error code:

https://docs.aws.amazon.com/AmazonS3/latest/API/ErrorResponses.html

I take this approach for the s3fs r implementation: https://github.com/DyfanJones/s3fs/blob/main/R/s3filesystem_class.R#L420

DyfanJones avatar Aug 24 '23 21:08 DyfanJones

I tried to use http_404, but then I got spurious HTTP 400 errors in some of my tests. For example, if I want to check if a version of of an object exists in a versioned bucket, the error code is unfortunately 400, not 404 (a serialization error).

wlandau avatar Aug 24 '23 22:08 wlandau

I might be going crazy but I believe this "should" be fairly easy to implement.

I have raised draft PR: https://github.com/paws-r/paws/pull/660

Currently it implements the standard retry mode described in: https://github.com/boto/boto3/blob/0b82bf9843ad6d350b48442c47f4a484a886ee3f/docs/source/guide/retries.rst#standard-retry-mode

remotes::install_github("dyfanjones/paws/paws.common", ref = "native_retry")

By default the max retries are 3. @wlandau please feel free to try it out, I am keen to get any feedback so I can squeeze this in the next paws.common release.

DyfanJones avatar Aug 25 '23 11:08 DyfanJones

Closing as paws.common 0.6.0 has been released onto the cran

DyfanJones avatar Sep 07 '23 12:09 DyfanJones