udfs
udfs copied to clipboard
AWS credentials as environment variables not working as expected
I'm trying to load private data from S3 in a fused UDF, and I want to make sure I'm doing it the "right" way.
I'm trying to follow these instructions: https://docs.fused.io/basics/utilities/#environment-variables In one UDF, I've got this:
env_vars = """
AWS_ACCESS_KEY_ID=AK...
AWS_SECRET_ACCESS_KEY=Gt...
"""
# Path to your .env file
env_file_path = '/mnt/cache/.env'
@fused.udf
def udf(bbox=None, n=10):
# Writing the environment variables to the .env file
with open(env_file_path, 'w') as file:
file.write(env_vars)
In the second UDF I've got this.
@fused.udf
def udf():
import os
import boto3
from dotenv import load_dotenv
# Load environment variable
env_file_path = '/mnt/cache/.env'
load_dotenv(env_file_path, override=True)
# these are being set correctly
assert os.environ['AWS_ACCESS_KEY_ID'] == 'AK...'
assert os.environ['AWS_SECRET_ACCESS_KEY'] == 'Gt...'
# doesn't work
# botocore.exceptions.ClientError: An error occurred (InvalidToken) when calling the GetObject operation: The provided token is malformed or otherwise invalid.
# s3 credentials not detected correctly from environment
# s3 = boto3.client('s3')
# does work if I explicitly pass the credentials
s3 = boto3.client(
's3',
aws_access_key_id=os.environ['AWS_ACCESS_KEY_ID'],
aws_secret_access_key=os.environ['AWS_SECRET_ACCESS_KEY']
)
bucket="arraylake-earthmover-production"
key="6462e90c27af040cabc066e8/chunks/0081af97634c03fc1c3fcd16b1f3c196558c15c096674f5a0052bf25479d0e8b.00000000000000000000000000000000"
obj = s3.get_object(Bucket=bucket, Key=key)
print(obj)
In most normal Python environments, boto3 will automatically get the credentials from the environment variables without having to pass them explicitly (see https://boto3.amazonaws.com/v1/documentation/api/latest/guide/credentials.html#environment-variables). However, in the fused UDF, this is not working for some reason, and if I don't pass the credentials explicitly, I get the "The provided token is malformed or otherwise invalid" error.
This is obviously not a huge problem. The workaround--explicitly passing the credentials--is easy enough. But I thought I would open this issue to try to understand better what is going on here.