metaflow
metaflow copied to clipboard
Metaflow steps fail when aws credentials are provided as environment variables
- I am trying to run a metaflow flow on a Kubernetes cluster (specifically a
kind
cluster that is running on my laptop). - AWS creds are provisioned inside the K8s cluster as a secret called
aws-creds
. The secret has the access key, secret key and aws region in the KEY=VALUE format. - I ran metaflow with the following options:
$ METAFLOW_DEFAULT_DATASTORE=s3 \
METAFLOW_DATASTORE_SYSROOT_S3=s3://my-bucket-name \
python3 c.py run --with kubernetes:secrets=aws-creds
- Metaflow created a Kubernetes job and Kubernetes in turn created the pod.
- When ssh'd into the pod, the AWS credentials are correctly present an env variables.
root@t-4v4s4-mdmw9:/# env | grep AWS
AWS_ACCESS_KEY_ID=<REDACTED_KEY>
AWS_SECRET_ACCESS_KEY=<REDACTED_SECRET>
AWS_DEFAULT_REGION=us-west-2
METAFLOW_DEFAULT_AWS_CLIENT_PROVIDER=boto3
- However, the pod fails eventually with the following error
Setting up task environment.
Downloading code package...
fatal error: Unable to locate credentials
- To confirm that these env variables are correct, I ran a simple ubuntu container, set the above creds as env variables and ran a simple boto3 script and it worked just fine.
Looks like there might be something within metaflow that prevents the AWS creds as env variables from taking effect.
Hi There,
I have experienced the exact same error with a very similar setup:
KIND-Cluster as local Kubernetes
MINIO-Docker as local S3
Both should be configured properly: S3 without Kubernetes works as expected. The moment I run --with kubernetes
with configured parameter in .metflowconfig
and DATATOOLS_CLIENT_PARAMS={"aws_acces_key": "xxx", "aws_secret_access_key":"xxx"}
the exact same error appears:
Setting up task environment.
Downloading code package...
fatal error: Unable to locate credentials
Any ideas why this happens? @shrinandj have you solved this issue differently? Any help would be highly appreciated. Many thanks for this cool project.
Cheers
Experiencing something similar using an EKS cluster with IRSA auth. boto/cli both work correctly in the container but I'm getting an error when the pod tries to connect to S3 as part of a run. I'm not getting any logs so it's proving quite tricky to track down.
I'm running into this same issue and wondering if anyone's found a solution