archivetar
archivetar copied to clipboard
Have archivetar not immediately fail if Globus is unavailable temporarily
Currently if globus fails like how httpd dies sometimes, archivetar will immediately fail.
Desired outcome would be some period of time it could retry before giving up.
eg:
Unable to connect to <>:443\\nglobus_xio: System error in connect: Connection refused\\nglobus_xio: A system call failed: Connection refused\\n\n", 'eHotMF73v')
I'm looking at https://github.com/jd/tenacity to implement some retries. Also, have an issue open with the Globus team to see if they have anything built-in or a best practice.
From the Globus team:
The SDK supports timeout and retry customization via the client's .transport attribute, which is an instance of the RequestsTransport class [documentation link].
There are several customization options exposed as attributes, but I think that the following will be helpful in this situation:
.TRANSIENT_ERROR_STATUS_CODES
.retry_backoff()
.max_retries
Looking at the archivetar code, it may be that code like this will accommodate longer retries, and enforce retries on HTTP 404:
# After instantiating the TransferClient
# --------------------------------------
# Add HTTP 404 as a status code that should be retried.
self.tc.transport.TRANSIENT_ERROR_STATUS_CODES += (404, )
# Retry once per second, without any backoff.
self.tc.transport.retry_backoff = lambda *_, **__: 1.0
# Allow up to 100 retries.
# This may result in more than 2 minutes of retries.
self.tc.transport.max_retries = 100
This will result in several minutes of retries before an exception is raised.