io
io copied to clipboard
Not able to list files in s3 bucket using glob.
Tensorflow Version: 2.6.0 Tensorflow IO Version: 0.21.0
import tensorflow as tf
import tensorflow_io as tfio
os.environ["AWS_REGION"] = "us-east-1"
os.environ["AWS_ACCESS_KEY_ID"] = "ACCESS_KEY"
os.environ["AWS_SECRET_ACCESS_KEY"] = "SECRET_KEY"
os.environ["S3_ENDPOINT"] = "http://localhost:4566"
BUCKET = "dataset"
tf.io.gfile.glob("s3://{}/train/*.tfrec".format(BUCKET))
Been banging my head against this, I'm surprised it's not thoroughly documented. Make sure boto3 can properly talk to Aws by inspecting the logs:
os.environ["TF_CPP_MIN_LOG_LEVEL"] = "0"
os.environ["TF_CPP_MIN_VLOG_LEVEL"] = "5"
os.environ["AWS_LOG_LEVEL"] = "trace"
I figured boto3 reads the profile settings (and creds) differently from my aws-cli so, while aws s3 ls worked fine, tf.io.gfile.glob was not able to authenticate with AWS. Btw, I also added os.environ['AWS_PROFILE'] = 'my-aws-profile' to make sure boto3 picks up the right profile.
~Having said that, I've managed to get that working with tf==2.6.0 and tf-io=0.20.0 or 0.21.0 but not at all with tf < 2.6 🤔~
Update: Turns out thanks to checking the logs 🙂 with tensorflow==2.5.0 and tensorflow-io==0.19.1 all I need to do is provide region and s3 endpoint manually to have it working fine, while tf==2.6.0 and tf-io=0.20.0 can do without 🤔🤔🤔.
os.environ['AWS_REGION'] = 'eu-west-1'
os.environ['S3_ENDPOINT'] = 'https://s3.eu-west-1.amazonaws.com'
Also, I learned tf.io.gfile.glob just returns an empty list if I didn't import tensorflow_io as tfio