postgres-aws-s3 icon indicating copy to clipboard operation
postgres-aws-s3 copied to clipboard

table_import_from_s3 in localstack has Access Key error

Open evbo opened this issue 1 year ago • 2 comments

I am trying to import s3 data from localstack, using:

select aws_s3.table_import_from_s3(
'tablename',
'col1,col2',
'(format csv, header true)', 
aws_commons.create_s3_uri('my-bucket','test.csv','us-west-2'),
aws_commons.create_aws_credentials('none','none',''),
'http://localstack:4566');

none is what I use for all localstack calls but using that results in:

ERROR: spiexceptions.ExternalRoutineException: botocore.exceptions.ClientError: An error occurred (InvalidAccessKeyId) when calling the GetObject operation: The AWS Access Key Id you provided does not exist in our records. CONTEXT: Traceback (most recent call last): PL/Python function "table_import_from_s3", line 7, in return plan.execute( PL/Python function "table_import_from_s3"

I have ensured this bucket is publicly accessible via:

AWS_PAGER="" \
AWS_ACCESS_KEY_ID=none \
AWS_SECRET_ACCESS_KEY=none \
aws \
  --endpoint-url=http://localstack:4566 \
  --region=us-west-2 \
  s3api put-bucket-policy \
    --bucket my-bucket \
    --policy '{
      "Id": "Policy1397632521960",
      "Statement": [
        {
          "Sid": "Stmt1397633323327",
          "Action": [
            "s3:GetObject"
          ],
          "Effect": "Allow",
          "Resource": "arn:aws:s3:::my-bucket/*",
          "Principal": {
          "AWS": [
            "*"
          ]
        }
      }
    ]
  }'

evbo avatar Aug 01 '22 23:08 evbo

@huiser I've tried also adding a secret key of 'none' and verified I can publicly access the files using wget without credentials. So is there a way for aws_s3 to accept no credentials? Or none as I've shown above?

wget http://localstack:4566/my-bucket/test.csv
Resolving localstack (localstack)... 172.25.0.3
Connecting to localstack (localstack)|172.25.0.3|:4566... connected.
HTTP request sent, awaiting response... 200 
Length: 133635 (131K) [text/csv]
Saving to: 'test.csv'

and if I use boto3 locally, it also successfully gets:

import boto3
s3client = boto3.client('s3', region_name='us-west-2', endpoint_url='http://localstack:4566', aws_access_key_id='none', aws_secret_access_key='none', aws_session_token='none')

print(s3client.get_object(Bucket='my-bucket',Key='test.csv'))
{'ResponseMetadata': {'RequestId': '1I8DPGPWYI21YKAKHO9Z5Y33OSA74FQU5UTOQ08LJCHZFUJ9TGY7', 'HTTPStatusCode': 200, ...
...
...

So if boto3 can connect in my simple example above, do you think this is a bug in postgres-aws-s3?

Besides this possibly being a bug, I see a lot of benefits to this since credentials are optional in Amazon's aws_s3 api, so adding this feature would also help maintain a consistent mirror with their product.

Also, The reason I don't use credentials is because I only communicate internally within my VPC (between RDS, S3, and Lambda). I am using an external (local) postgres database only because currently localstack doesn't have full/free support for RDS. Thanks to docker, I can bridge connections to a local postgres database which is otherwise identical to RDS.

evbo avatar Aug 04 '22 16:08 evbo

I found the alternative method that allows me to specify bucket and region separately. When I use this it works: https://github.com/chimpler/postgres-aws-s3/blob/b817be9caf54e5b09c5c6edb924cf1b17df0e75c/aws_s3--0.0.1.sql#L41

Not ideal though since it doesn't mirror my production code, which uses the s3_uri object instead. Also, not clear why this works but the other method doesn't. Thanks for any help with this and I hope it's valuable feedback.

This is a great project! Thanks for sharing :)

evbo avatar Aug 04 '22 16:08 evbo