boto3 icon indicating copy to clipboard operation
boto3 copied to clipboard

Unable Connect do SQS if using a VPC

Open victorsantosdevops opened this issue 5 years ago • 20 comments

when i try send sqs message from lambda in a VPC, i get timeout. I tryed use the VPC Link, but dont work. { "errorMessage": "2019-03-07T13:45:11.739Z 7cb1fd0f-7b84-4fcd-8775-01f0f374a0a9 Task timed out after 15.01 seconds" }

SG Outbound ALL Open and NACL too. I already create the VPC Link.

Function Logs [INFO] 2019-03-07T13:44:56.744Z 7cb1fd0f-7b84-4fcd-8775-01f0f374a0a9 Start with Hash: 1111114502ff8532d063b9d988e2406a [INFO] 2019-03-07T13:44:56.744Z 7cb1fd0f-7b84-4fcd-8775-01f0f374a0a9 msgData: {'msgBody': 'Howdy @ 2019-03-07 13:44:56', 'msgAttributes': {'hash': {'StringValue': '1111114502ff8532d063b9d988e2406a', 'DataType': 'String'}}} 2019-03-07 13:45:11.739 7cb1fd0f-7b84-4fcd-8775-01f0f374a0a9 ask timed out after 15.01 secondsundefined

If i remove the VPC all work fine... But i need this fuction working inside a VPC anyone help me please T_T

victorsantosdevops avatar Mar 07 '19 14:03 victorsantosdevops

I'm having the same problem. I can access KMS and SSM properly, just not SQS

SteveByerly avatar Mar 08 '19 18:03 SteveByerly

I finally figured this out.

In order for the routes to work properly, you need to use a specific URL for the api calls as noted in the docs. The SQS metadata hasn't been updated in a long time and so it does not have this updated URL scheme.

The solution was not clear to me originally since the argument for the send_message method uses a URL - which I verified was in the proper format. The URL in question is the one where the API call is sent to - the queue URL is just part of the API call's params.

So the fix is to override the endpoint_url when making your client/resource.

session = boto3.Session()

sqs_client = session.client(
    service_name='sqs',
    endpoint_url='https://sqs.us-east-1.amazonaws.com',
)

sqs_client.send_message(
    QueueUrl='https://sqs.us-east-1.amazonaws.com/...',
    MessageBody=json.dumps('my payload'),
)

SteveByerly avatar Mar 08 '19 19:03 SteveByerly

So the reason we use the alternate endpoint style is to support Python 2.6 as it does not support SNI, which is required for the new endpoints. We would need to drop support for python 2.6-2.7.8. Even then it would still be a breaking change because people have whitelists for particular urls, so changing what we use would break them.

One possibility in the short term is to add a configuration setting to switch over to the new endpoints.

JordonPhillips avatar Mar 11 '19 17:03 JordonPhillips

That makes sense. I don't necessarily think configuration would be better since the user would still need to know about the configuration options.

A warning in the docs would be a good start, perhaps at the top of the page and each relevant section. I looked at the docs several times for a clue when I was working through this - that would have likely resolved it quickly.

Another idea would be to log warnings if the user is on py2.7.8+, is using a new-style URL for the queue_url, and has not set the endpoint_url.

Thanks for following up!

SteveByerly avatar Mar 11 '19 23:03 SteveByerly

Any updates or plans for tackling this issue? We're stuck on older versions of boto3 so we can work with SQS inside our VPCs.

dt-kylecrayne avatar Apr 24 '19 18:04 dt-kylecrayne

@SteveByerly thanks much for https://github.com/boto/boto3/issues/1900#issuecomment-471047309

And I think a warning in the docs/logs would be good.

michaelwills avatar Sep 24 '19 08:09 michaelwills

@SteveByerly, you're my hero.

Second that. The docs absolutely do not cover this (seems to apply to sqs only) and I burned 8 hours trying to figure it out.

Jon-AtAWS avatar Nov 21 '19 00:11 Jon-AtAWS

I want to add to the observation, it seems like it's not even consistent across regions. I had same code with same setup working in one region, but failing in another, sending me to investigate networking problems.

Overriding endpoint URL works in both regions, but default sqs_client = boto3.client('sqs') only in one. Real head scratcher imma tell you.

oleksii-donoha avatar Dec 06 '19 14:12 oleksii-donoha

The proposed solution with the additional endpoint_url doesn't seem to solve the problem in our case. Just to be sure, it is the same hostname as the queue url, without the path, etc? So given QueueUrl: https://sqs.eu-central-1.amazonaws.com/1234567/queue-name the endpoint_url would be https://sqs.eu-central-1.amazonaws.com?

christophevg avatar Jan 14 '20 20:01 christophevg

To avoid confusion a quick follow-up: our problem was related to the lambda not having access rights to the public SQS endpoint. After fixing that, simply using sqs_client = boto3.client('sqs') worked as expected.

christophevg avatar Jan 15 '20 11:01 christophevg

Any updates on this one? I'm trying to run SQS and celery in AWS with a VPC Endpoint (no NAT gateways). Celery initializes the boto3 client with default parameters, and it's not possible to modify the boto3 client initialization code to set the endpoint_url parameter to the right url. I checked that sending a message directly with boto3 and setting endpoint_url works, but with celery the connection times out cause it tries to connect using the default (legacy) endpoint which is not supported with VPC endpoints. AWS ref: https://docs.aws.amazon.com/AWSSimpleQueueService/latest/SQSDeveloperGuide/sqs-sending-messages-from-vpc.html

marianobrc avatar Jul 03 '21 02:07 marianobrc

@dt-kylecrayne I'm having the same issue, which boto3 version is working for you with SQS inside your VPCs? Thanks

marianobrc avatar Jul 03 '21 02:07 marianobrc

I found the following workaround overriding boto settings in endpoints.json:

  1. Copy .venv/lib/python3.8/site-packages/botocore/data/endpoints.json to a known path inside a directory/ (your path may be different depending on where is boto intalled)
  2. Edit the file and replace any reference to "queue.{dnsSuffix}" with "sqs.{region}.{dnsSuffix}". This will modify the endpoint url format.
  3. Also edit "protocols" : [ "http", "https" ] removing "http". SQS VPC endpoints only work through https.
  4. Set the env var AWS_DATA_PATH=/directory/conaining/your/file/ to tell boto to get settings from there first.

I hope this helps someone else until this gets fixed

marianobrc avatar Jul 04 '21 14:07 marianobrc

This would be quite simple to fix within botocore. The offending line is 467 in client.py. A simple check for python version or for ssl.HAS_SNI to choose either the sslCommonName or the hostname should do it. Currently this line simply chooses sslCommonName if it exists, and hostname otherwise. For SQS and a couple of other services, the sslCommonName always exists in current botocore.

Until this gets fixed (as I said, should be simple), I've created a microlibrary that implements a variation of the solution that @marianobrc indicated directly above. You can find this here - https://pypi.org/project/awsserviceendpoints/

joseph-wortmann avatar Aug 19 '21 21:08 joseph-wortmann

Any updates on a fix for this?

willronchetti avatar Oct 20 '21 13:10 willronchetti

this also results in mismatch data between the cli and boto api usage, as the cli for some reason knows how to use the correct endpoint (sqs.region) but the boto api usage doesn't and has the legacy region. when querying queue url the service returns it based on the accessed host, so now we have data inconsistencies as well because of this.

❯ aws sqs list-queues
{
    "QueueUrls": [
        "https://sqs.us-east-2.amazonaws.com/123456785098/assetdb-ftest-cvKP",
        "https://sqs.us-east-2.amazonaws.com/123456785098/dev_policy_deploys",
        "https://sqs.us-east-2.amazonaws.com/123456785098/dev_policy_deploys_dlq",
        "https://sqs.us-east-2.amazonaws.com/123456785098/local-assetdb",
        "https://sqs.us-east-2.amazonaws.com/123456785098/test",
        "https://sqs.us-east-2.amazonaws.com/123456785098/test2",
        "https://sqs.us-east-2.amazonaws.com/123456785098/test3",
        "https://sqs.us-east-2.amazonaws.com/123456785098/test4",
        "https://sqs.us-east-2.amazonaws.com/123456785098/test5"
    ]
}

❯ python
Python 3.10.0 (default, Oct  5 2021, 06:12:41) [GCC 9.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import boto3
>>> import pprint
>>> pprint.pprint(boto3.client('sqs').list_queues())
{'QueueUrls': ['https://us-east-2.queue.amazonaws.com/123456785098/assetdb-ftest-cvKP',
               'https://us-east-2.queue.amazonaws.com/123456785098/dev_policy_deploys',
               'https://us-east-2.queue.amazonaws.com/123456785098/dev_policy_deploys_dlq',
               'https://us-east-2.queue.amazonaws.com/123456785098/local-assetdb',
               'https://us-east-2.queue.amazonaws.com/123456785098/test',
               'https://us-east-2.queue.amazonaws.com/123456785098/test2',
               'https://us-east-2.queue.amazonaws.com/123456785098/test3',
               'https://us-east-2.queue.amazonaws.com/123456785098/test4',
               'https://us-east-2.queue.amazonaws.com/123456785098/test5'],
 'ResponseMetadata': {'HTTPHeaders': {'content-length': '989',
                                      'content-type': 'text/xml',
                                      'date': 'Thu, 18 Nov 2021 13:04:54 GMT',
                                      'x-amzn-requestid': '554b37b9-02bd-5e12-ad5a-6da9530bfb45'},
                      'HTTPStatusCode': 200,
                      'RequestId': '554b37b9-02bd-5e12-ad5a-6da9530bfb45',
                      'RetryAttempts': 0}}

it feels like madness to me that the sdk is forcing all its users to work around it.

is there a default sane configuration without having to manually pass in endpoint, ie. how is the awscli doing the right thing?

can we get a environment flag similiar to sts regional endpoints?

kapilt avatar Nov 17 '21 11:11 kapilt

Resolved the issue by putting lambda function in private subnet and allowing internet access using NAT gateway.

VPC -> create private subnets -> create NAT Gateway in public subnet -> attach private subnets to NAT Gateway -> lambda configuration update VPC setting.

session = boto3.Session(region_name="ca-central-1") sqs = session.client(service_name='sqs', endpoint_url='https://sqs.ca-central-1.amazonaws.com')

AbdulBasitKhaleeq avatar Jul 06 '22 18:07 AbdulBasitKhaleeq

I have had a lambda function sending messages to a sqs queue configured with a vpc, it has been working normally for several months, but now out of nowhere no messages are sent and the function times out. The Lambda function is in a private subnet.

sejr1996 avatar Apr 18 '23 16:04 sejr1996

Change the security group ingress rules to allow all traffic, that works. Previously the configurations allowed access through port 22 and 2049, which port should be added for the correct functioning of the sqs queues?

sejr1996 avatar Apr 18 '23 17:04 sejr1996

Change the security group ingress rules to allow all traffic, that works. Previously the configurations allowed access through port 22 and 2049, which port should be added for the correct functioning of the sqs queues?

Same thing happened to me. Lambda running with the VPC set up, there is a endpoint created so the resources within the VPN can access SQS endpoints. All working fine for years. Suddenly lambdas started to timeout and couldn't resolve SQS endoints. Opened the doors as @sejr1996 mentioned as a last resort and it worked for now.

dfloresxyon avatar Dec 21 '23 18:12 dfloresxyon