botocore icon indicating copy to clipboard operation
botocore copied to clipboard

Misalignment between custom botocore.config retry parameters and the time it takes to raise ConnectTimeoutError

Open mgilar opened this issue 1 year ago • 2 comments

Describe the bug

When connection cannot be stablished using a CloudWatch boto3 client, the time it takes to raise the ConnectTimeoutError does not match the time it should take given the custom parameters defined using the botocore.config Config class.

The time to raise the connection error consistently takes three times the max. number of retrials or max timeout time.

Regression Issue

  • [ ] Select this option if this issue appears to be a regression.

Expected Behavior

Given the following custom configuration to be applied in a CloudWatch boto3 client:

import boto3
from botocore.config import Config

custom_config = Config(
    connect_timeout=1,  # Time to establish connection
    read_timeout=1,  # Time to wait for a response after connection
    retries={
        "total_max_attempts": 1,
        "mode": "standard"
    },
)

cloudwatch = boto3.client("cloudwatch", region_name="eu-west-1", config=custom_config)

After testing the retrials behaviour when the connection cannot be stablished, with different parameter configurations I obtain the following execution times (time difference between cloudwatch.PutMetricData is logged and the ConnectTimeoutError is raised):

connect_timeout = 1 read_timeout = 1 total_max_attempts = 1 Execution time: ~1s

connect_timeout = 1 read_timeout = 1 total_max_attempts = 2 Execution time: ~2s

connect_timeout = 1 read_timeout = 1 total_max_attempts = 3 Execution time: ~3s

connect_timeout = 2 read_timeout = 1 total_max_attempts = 1 Execution time: ~2s

Current Behavior

After testing the retrials behaviour when the connection cannot be stablished, with different parameter configurations I obtain the following execution times (time difference between cloudwatch.PutMetricData is logged and the ConnectTimeoutError is raised):

connect_timeout = 1 read_timeout = 1 total_max_attempts = 1 Execution time: ~3s

connect_timeout = 1 read_timeout = 1 total_max_attempts = 2 Execution time: ~7s

connect_timeout = 1 read_timeout = 1 total_max_attempts = 3 Execution time: ~11s

connect_timeout = 2 read_timeout = 1 total_max_attempts = 1 Execution time: ~6s

Reproduction Steps

Create the following custom configuration to be applied in a CloudWatch boto3 client:

import boto3
from botocore.config import Config

custom_config = Config(
    connect_timeout=1,  # Time to establish connection
    read_timeout=1,  # Time to wait for a response after connection
    retries={
        "total_max_attempts": 1,
        "mode": "standard"
    },
)

cloudwatch = boto3.client("cloudwatch", region_name="eu-west-1", config=custom_config)

Execute the code from an AWS Service that does not have permissions to access/write Cloudwatch metrics (in my case it was a Sagemaker endpoint).

Package versions

botocore = 1.35.58 boto3 = 1.35.58 urllib3 = 2.2.3

Possible Solution

It looks like the source of the problem (see additional info/context to check full error logs) is related to urllib connection session parameters not being overwritten by custom retry parameters. Therefore urllib connection is executed as many times as its default parameter: 3.

Additional Information/Context

The full error obtained when connection cannot be established is:

2024-11-12T12:28:47,744 [INFO ] W-04fd8f971b1bfc2c8d11abb1b-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - conn.connect() 2024-11-12T12:28:47,744 [INFO ] W-04fd8f971b1bfc2c8d11abb1b-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - File "/venv/lib64/python3.11/site-packages/urllib3/connection.py", line 693, in connect 2024-11-12T12:28:47,744 [INFO ] W-04fd8f971b1bfc2c8d11abb1b-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - self.sock = sock = self._new_conn() 2024-11-12T12:28:47,744 [INFO ] W-04fd8f971b1bfc2c8d11abb1b-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - ^^^^^^^^^^^^^^^^ 2024-11-12T12:28:47,744 [INFO ] W-04fd8f971b1bfc2c8d11abb1b-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - File "/venv/lib64/python3.11/site-packages/urllib3/connection.py", line 208, in _new_conn 2024-11-12T12:28:47,744 [INFO ] W-04fd8f971b1bfc2c8d11abb1b-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - raise ConnectTimeoutError( 2024-11-12T12:28:47,744 [INFO ] W-04fd8f971b1bfc2c8d11abb1b-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - urllib3.exceptions.ConnectTimeoutError: (<botocore.awsrequest.AWSHTTPSConnection object at 0x7f058020aad0>, 'Connection to monitoring.eu-west-1.amazonaws.com timed out. (connect timeout=2)') 2024-11-12T12:28:47,744 [INFO ] W-04fd8f971b1bfc2c8d11abb1b-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - 2024-11-12T12:28:47,744 [INFO ] W-04fd8f971b1bfc2c8d11abb1b-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - During handling of the above exception, another exception occurred: 2024-11-12T12:28:47,744 [INFO ] W-04fd8f971b1bfc2c8d11abb1b-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - 2024-11-12T12:28:47,744 [INFO ] W-04fd8f971b1bfc2c8d11abb1b-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - Traceback (most recent call last): 2024-11-12T12:28:47,744 [INFO ] W-04fd8f971b1bfc2c8d11abb1b-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - File "/venv/lib64/python3.11/site-packages/botocore/endpoint.py", line 279, in _do_get_response 2024-11-12T12:28:47,744 [INFO ] W-04fd8f971b1bfc2c8d11abb1b-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - http_response = self._send(request) 2024-11-12T12:28:47,744 [INFO ] W-04fd8f971b1bfc2c8d11abb1b-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - ^^^^^^^^^^^^^^^^^^^ 2024-11-12T12:28:47,744 [INFO ] W-04fd8f971b1bfc2c8d11abb1b-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - File "/venv/lib64/python3.11/site-packages/botocore/endpoint.py", line 383, in _send 2024-11-12T12:28:47,744 [INFO ] W-04fd8f971b1bfc2c8d11abb1b-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - return self.http_session.send(request) 2024-11-12T12:28:47,744 [INFO ] W-04fd8f971b1bfc2c8d11abb1b-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2024-11-12T12:28:47,744 [INFO ] W-04fd8f971b1bfc2c8d11abb1b-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - File "/venv/lib64/python3.11/site-packages/botocore/httpsession.py", line 499, in send 2024-11-12T12:28:47,744 [INFO ] W-04fd8f971b1bfc2c8d11abb1b-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - raise ConnectTimeoutError(endpoint_url=request.url, error=e) 2024-11-12T12:28:47,744 [INFO ] W-04fd8f971b1bfc2c8d11abb1b-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - botocore.exceptions.ConnectTimeoutError: Connect timeout on endpoint URL: "https://monitoring.eu-west-1.amazonaws.com/" 2024-11-12T12:28:47,745 [INFO ] W-04fd8f971b1bfc2c8d11abb1b-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - Event needs-retry.cloudwatch.PutMetricData: calling handler <bound method RetryHandler.needs_retry of <botocore.retries.standard.RetryHandler object at 0x7f0580287910>> 2024-11-12T12:28:47,745 [INFO ] W-04fd8f971b1bfc2c8d11abb1b-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - Max attempts of 1 reached. 2024-11-12T12:28:47,745 [INFO ] W-04fd8f971b1bfc2c8d11abb1b-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - Not retrying request. 2024-11-12T12:28:47,745 [INFO ] W-04fd8f971b1bfc2c8d11abb1b-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - Failed to save custom metric with error: Connect timeout on endpoint URL: "https://monitoring.eu-west-1.amazonaws.com/"

SDK version used

1.35.58

Environment details (OS name and version, etc.)

Linux

mgilar avatar Nov 12 '24 15:11 mgilar

Hi @mgilar, thank you for reaching out. I was able to get the same error ConnectTimeoutError using your sample code and set up (CloudWatch and SageMaker) but in order to get the error I had to set my "connect_timeout" and "read_timeout" very low as possible (for example 0.000001). When it is at 1, it works fine. For a further look, could you please provide your used case for the test and also to provide the full debug logs by adding the line: boto3.set_stream_logger('') and redacting any sensitive information. And also if possible, to provide the full code that you may have and any further related logs. Thank you.

adev-code avatar Nov 13 '24 21:11 adev-code

Hi @adev-code, let me provide further details on the above mentioned issue:

  • I have deployed a MultiModel Sagemaker Endpoint with custom inference code. I created a boto3 client of cloudwatch within the model handler, as depicted in the bug description, and then use it to log model usage information such as:
import boto3
from botocore.config import Config

boto3.set_stream_logger("", logging.DEBUG)

custom_config = Config(
            connect_timeout=event.get("connect_timeout", 1),  # Time to establish connection
            read_timeout=event.get(
                "read_timeout", 1
            ),  # Time to wait for a response after connection
            retries={
                "max_attempts": 0,
                "total_max_attempts": 1,
                "mode": "standard",
            },
        )
REGION = os.environ.get("AWS_REGION", "eu-west-2")
cloudwatch = boto3.client("cloudwatch", region_name=REGION, config=custom_config)

dimensions = [
            {"Name": "ModelName", "Value": model_name},
            {"Name": "TriggeringEntity", "Value": triggering_entity},
            {"Name": "StatusCode", "Value": str(status_code)},
            {"Name": "Method", "Value": method},
        ]
        response = cloudwatch.put_metric_data(
            Namespace="test",
            MetricData=[
                {
                    "MetricName": "ModelContext",
                    "Dimensions": dimensions,
                    "Timestamp": datetime.utcnow(),
                    "Value": 1,
                    "Unit": "Count",
                    "StorageResolution": 60,
                }
            ],
        )
        # Check response for success
        if response["ResponseMetadata"]["HTTPStatusCode"] == 200:
            log.info("Metric data successfully put to CloudWatch.")
        else:
            log.info("Failed to put metric data to CloudWatch.")

The obtained logs are:

2024-11-20T09:45:47,487 [WARN ] W-82500c51c9d78af3b3ebaff28-8-stderr com.amazonaws.ml.mms.wlm.WorkerLifeCycle - File "/venv/lib64/python3.11/site-packages/urllib3/connectionpool.py", line 789, in urlopen

2024-11-20T09:45:47,487 [WARN ] W-82500c51c9d78af3b3ebaff28-8-stderr com.amazonaws.ml.mms.wlm.WorkerLifeCycle - response = self._make_request( 2024-11-20T09:45:47,487 [WARN ] W-82500c51c9d78af3b3ebaff28-8-stderr com.amazonaws.ml.mms.wlm.WorkerLifeCycle - ^^^^^^^^^^^^^^^^^^^ 2024-11-20T09:45:47,487 [WARN ] W-82500c51c9d78af3b3ebaff28-8-stderr com.amazonaws.ml.mms.wlm.WorkerLifeCycle - File "/venv/lib64/python3.11/site-packages/urllib3/connectionpool.py", line 490, in _make_request 2024-11-20T09:45:47,487 [WARN ] W-82500c51c9d78af3b3ebaff28-8-stderr com.amazonaws.ml.mms.wlm.WorkerLifeCycle - raise new_e 2024-11-20T09:45:47,487 [WARN ] W-82500c51c9d78af3b3ebaff28-8-stderr com.amazonaws.ml.mms.wlm.WorkerLifeCycle - File "/venv/lib64/python3.11/site-packages/urllib3/connectionpool.py", line 466, in _make_request 2024-11-20T09:45:47,487 [WARN ] W-82500c51c9d78af3b3ebaff28-8-stderr com.amazonaws.ml.mms.wlm.WorkerLifeCycle - self._validate_conn(conn) 2024-11-20T09:45:47,487 [WARN ] W-82500c51c9d78af3b3ebaff28-8-stderr com.amazonaws.ml.mms.wlm.WorkerLifeCycle - File "/venv/lib64/python3.11/site-packages/urllib3/connectionpool.py", line 1095, in _validate_conn 2024-11-20T09:45:47,487 [WARN ] W-82500c51c9d78af3b3ebaff28-8-stderr com.amazonaws.ml.mms.wlm.WorkerLifeCycle - conn.connect() 2024-11-20T09:45:47,487 [WARN ] W-82500c51c9d78af3b3ebaff28-8-stderr com.amazonaws.ml.mms.wlm.WorkerLifeCycle - File "/venv/lib64/python3.11/site-packages/urllib3/connection.py", line 693, in connect 2024-11-20T09:45:47,487 [WARN ] W-82500c51c9d78af3b3ebaff28-8-stderr com.amazonaws.ml.mms.wlm.WorkerLifeCycle - self.sock = sock = self._new_conn() 2024-11-20T09:45:47,487 [WARN ] W-82500c51c9d78af3b3ebaff28-8-stderr com.amazonaws.ml.mms.wlm.WorkerLifeCycle - ^^^^^^^^^^^^^^^^ 2024-11-20T09:45:47,487 [WARN ] W-82500c51c9d78af3b3ebaff28-8-stderr com.amazonaws.ml.mms.wlm.WorkerLifeCycle - File "/venv/lib64/python3.11/site-packages/urllib3/connection.py", line 208, in _new_conn 2024-11-20T09:45:47,487 [WARN ] W-82500c51c9d78af3b3ebaff28-8-stderr com.amazonaws.ml.mms.wlm.WorkerLifeCycle - raise ConnectTimeoutError( 2024-11-20T09:45:47,487 [WARN ] W-82500c51c9d78af3b3ebaff28-8-stderr com.amazonaws.ml.mms.wlm.WorkerLifeCycle - urllib3.exceptions.ConnectTimeoutError: (<botocore.awsrequest.AWSHTTPSConnection object at 0x7f3bcbd31ad0>, 'Connection to monitoring.eu-west-1.amazonaws.com timed out. (connect timeout=1)') 2024-11-20T09:45:47,487 [WARN ] W-82500c51c9d78af3b3ebaff28-8-stderr com.amazonaws.ml.mms.wlm.WorkerLifeCycle - 2024-11-20T09:45:47,487 [WARN ] W-82500c51c9d78af3b3ebaff28-8-stderr com.amazonaws.ml.mms.wlm.WorkerLifeCycle - During handling of the above exception, another exception occurred: 2024-11-20T09:45:47,487 [WARN ] W-82500c51c9d78af3b3ebaff28-8-stderr com.amazonaws.ml.mms.wlm.WorkerLifeCycle - 2024-11-20T09:45:47,487 [WARN ] W-82500c51c9d78af3b3ebaff28-8-stderr com.amazonaws.ml.mms.wlm.WorkerLifeCycle - Traceback (most recent call last): 2024-11-20T09:45:47,487 [WARN ] W-82500c51c9d78af3b3ebaff28-8-stderr com.amazonaws.ml.mms.wlm.WorkerLifeCycle - File "/venv/lib64/python3.11/site-packages/botocore/endpoint.py", line 279, in _do_get_response 2024-11-20T09:45:47,487 [WARN ] W-82500c51c9d78af3b3ebaff28-8-stderr com.amazonaws.ml.mms.wlm.WorkerLifeCycle - http_response = self._send(request) 2024-11-20T09:45:47,487 [WARN ] W-82500c51c9d78af3b3ebaff28-8-stderr com.amazonaws.ml.mms.wlm.WorkerLifeCycle - ^^^^^^^^^^^^^^^^^^^ 2024-11-20T09:45:47,487 [WARN ] W-82500c51c9d78af3b3ebaff28-8-stderr com.amazonaws.ml.mms.wlm.WorkerLifeCycle - File "/venv/lib64/python3.11/site-packages/botocore/endpoint.py", line 383, in _send 2024-11-20T09:45:47,487 [WARN ] W-82500c51c9d78af3b3ebaff28-8-stderr com.amazonaws.ml.mms.wlm.WorkerLifeCycle - return self.http_session.send(request) 2024-11-20T09:45:47,487 [WARN ] W-82500c51c9d78af3b3ebaff28-8-stderr com.amazonaws.ml.mms.wlm.WorkerLifeCycle - ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2024-11-20T09:45:47,487 [WARN ] W-82500c51c9d78af3b3ebaff28-8-stderr com.amazonaws.ml.mms.wlm.WorkerLifeCycle - File "/venv/lib64/python3.11/site-packages/botocore/httpsession.py", line 499, in send 2024-11-20T09:45:47,487 [WARN ] W-82500c51c9d78af3b3ebaff28-8-stderr com.amazonaws.ml.mms.wlm.WorkerLifeCycle - raise ConnectTimeoutError(endpoint_url=request.url, error=e) 2024-11-20T09:45:47,487 [WARN ] W-82500c51c9d78af3b3ebaff28-8-stderr com.amazonaws.ml.mms.wlm.WorkerLifeCycle - botocore.exceptions.ConnectTimeoutError: Connect timeout on endpoint URL: "https://monitoring.eu-west-1.amazonaws.com/" 2024-11-20T09:45:47,487 [WARN ] W-82500c51c9d78af3b3ebaff28-8-stderr com.amazonaws.ml.mms.wlm.WorkerLifeCycle - 2024-11-20 09:45:47,485 botocore.hooks [DEBUG] Event needs-retry.cloudwatch.PutMetricData: calling handler <bound method RetryHandler.needs_retry of <botocore.retries.standard.RetryHandler object at 0x7f3bc838e790>> 2024-11-20T09:45:47,487 [WARN ] W-82500c51c9d78af3b3ebaff28-8-stderr com.amazonaws.ml.mms.wlm.WorkerLifeCycle - 2024-11-20 09:45:47,485 botocore.retries.standard [DEBUG] Max attempts of 1 reached. 2024-11-20T09:45:47,487 [WARN ] W-82500c51c9d78af3b3ebaff28-8-stderr com.amazonaws.ml.mms.wlm.WorkerLifeCycle - 2024-11-20 09:45:47,485 botocore.retries.standard [DEBUG] Not retrying request. 2024-11-20T09:45:47,487 [WARN ] W-82500c51c9d78af3b3ebaff28-8-stderr com.amazonaws.ml.mms.wlm.WorkerLifeCycle - { "component": "GetAIModelPrediction", "request_id": null, "timestamp": 1732095947000, "level": "ERROR", "type": "app", "msg": "Failed to save custom metric with error: Connect timeout on endpoint URL: "https://monitoring.eu-west-1.amazonaws.com/"" } 2024-11-20T09:45:47,487 [WARN ] W-82500c51c9d78af3b3ebaff28-8-stderr com.amazonaws.ml.mms.wlm.WorkerLifeCycle - 2024-11-20 09:45:47,486 libawsutils.ailogger [ERROR] Failed to save custom metric with error: Connect timeout on endpoint URL: "https://monitoring.eu-west-1.amazonaws.com/"

Let me know if I can provide further information

mgilar avatar Nov 20 '24 10:11 mgilar

Thanks for the response. I have repro'd the issue and it seems like Boto3 and Botocore follows the users config. I have only repro'd directly using Boto3, not within an entity. It could be possible that other entity/service may retry or alter the config but could you try by just using Boto3 (with no other service nor in a service) and see the behavior?

I have tried 1 second, 4 second and so on, the Boto3 SDK follows the timeout set.

import boto3
from botocore.config import Config
import time

boto3.set_stream_logger('')

custom_config = Config(
    connect_timeout=1,
    read_timeout=1,
    retries={
        "total_max_attempts": 1,
        "mode": "standard"
    },
)

# Use a non-existent endpoint to force failure
cloudwatch = boto3.client(
    "cloudwatch",
    region_name="eu-west-1",
    endpoint_url="http://192.0.2.1:8080",  # Non-routable IP
    config=custom_config
)

start_time = time.time()
try:
    cloudwatch.list_metrics()
except Exception as e:
    end_time = time.time()
    print(f"Failed after {end_time - start_time:.2f} seconds")
    print(f"Error: {type(e).__name__}: {e}")

version: Boto3/1.40.40 md/Botocore#1.40.40

adev-code avatar Sep 26 '25 21:09 adev-code

Greetings! It looks like this issue hasn’t been active in longer than five days. We encourage you to check if this is still an issue in the latest release. In the absence of more information, we will be closing this issue soon. If you find that this is still a problem, please feel free to provide a comment or upvote with a reaction on the initial post to prevent automatic closure. If the issue is already closed, please feel free to open a new one.

github-actions[bot] avatar Oct 07 '25 01:10 github-actions[bot]

This issue is now closed. Comments on closed issues are hard for our team to see. If you need more assistance, please open a new issue that references this one.

github-actions[bot] avatar Oct 07 '25 17:10 github-actions[bot]