RTX-KG2 icon indicating copy to clipboard operation
RTX-KG2 copied to clipboard

Add retry capabilities for `kg2_util.download_file_if_not_exist_locally` to avoid systemic failures

Open d33bs opened this issue 1 year ago • 2 comments

In attempting to recreate and run the RTX-KG2 pipeline I found that kg2_util.download_file_if_not_exist_locally sometimes had a tendency to fail on first attempt to download a resource. When this happens it can sometimes cause a full pipeline failure for singular download errors. I'm unsure of the exact cause (it could be network location based, or file hosting service, etc) but generally found that the network requests were sometimes initially unable to complete and would succeed if tried again. To avoid this, a simple retry loop may provide a programmatic way to retry a download without much time cost when it comes to a full pipeline run. Adding this retry loop I found removed the cases where I saw failures.

Current code (link):

def download_file_if_not_exist_locally(url: str, local_file_name: str):
    if url is not None:
        local_file_path = pathlib.Path(local_file_name)
        if not local_file_path.is_file():
            ctx = ssl.create_default_context()
            ctx.check_hostname = False
            ctx.verify_mode = ssl.CERT_NONE
            # the following code is ugly but necessary because sometimes the TLS
            # certificates of remote sites are broken and some of the PURL'd
            # URLs resolve to HTTPS URLs (would prefer to just use
            # urllib.request.urlretrieve, but it doesn't seem to support
            # specifying an SSL "context" which we need in order to ignore the cert):
            temp_file_name = tempfile.mkstemp(prefix=TEMP_FILE_PREFIX + '-')[1]
            with urllib.request.urlopen(url, context=ctx) as u, open(temp_file_name, 'wb') as f:
                f.write(u.read())
            shutil.move(temp_file_name, local_file_name)
    return local_file_name

Suggested code:

def download_file_if_not_exist_locally(url: str, local_file_name: str):
    # general imports for portability
    import ssl
    import pathlib
    import tempfile
    import urllib
    import urllib.parse
    import urllib.request
    import shutil

    # import http for exception handling
    from http import client

    # set maximum retries possible
    MAX_RETRIES = 3

    if url is not None:
        local_file_path = pathlib.Path(local_file_name)
        if not local_file_path.is_file():
            # create a loop for attempting retries
            for attempt in range(1, MAX_RETRIES + 1):
                try:
                    ctx = ssl.create_default_context()
                    ctx.check_hostname = False
                    ctx.verify_mode = ssl.CERT_NONE
                    # the following code is ugly but necessary because sometimes the TLS
                    # certificates of remote sites are broken and some of the PURL'd
                    # URLs resolve to HTTPS URLs (would prefer to just use
                    # urllib.request.urlretrieve, but it doesn't seem to support
                    # specifying an SSL "context" which we need in order to ignore the cert):
                    temp_file_name = tempfile.mkstemp(prefix=TEMP_FILE_PREFIX + "-")[1]

                    with urllib.request.urlopen(url, context=ctx) as u, open(
                        temp_file_name, "wb"
                    ) as f:
                        f.write(u.read())

                    shutil.move(temp_file_name, local_file_name)

                    # break out of retry loop
                    break

                # catch incomplete http read exception
                except client.IncompleteRead as e:
                    if attempt < MAX_RETRIES:
                        # if we haven't reached maximum retries, try again
                        print(f"Attempt {attempt} failed. Retrying...")
                    else:
                        # If max retries reached, raise the exception
                        raise

    return local_file_name

d33bs avatar Jan 29 '24 21:01 d33bs

Thank you for the patch to implement retry capability, @d33bs ! We will get this into the main branch.

saramsey avatar Feb 11 '24 18:02 saramsey

Sounds good, thank you as well @saramsey ! If it'd be helpful, may I submit a PR towards this cause? I can also understand if you wouldn't suggest it.

d33bs avatar Feb 12 '24 20:02 d33bs