io icon indicating copy to clipboard operation
io copied to clipboard

Not able to list files in s3 bucket using glob.

Open II-VSB-II opened this issue 4 years ago • 1 comments

Tensorflow Version: 2.6.0 Tensorflow IO Version: 0.21.0

import tensorflow as tf
import tensorflow_io as tfio
os.environ["AWS_REGION"] = "us-east-1"
os.environ["AWS_ACCESS_KEY_ID"] = "ACCESS_KEY"
os.environ["AWS_SECRET_ACCESS_KEY"] = "SECRET_KEY"
os.environ["S3_ENDPOINT"] = "http://localhost:4566"

BUCKET = "dataset"
tf.io.gfile.glob("s3://{}/train/*.tfrec".format(BUCKET))

II-VSB-II avatar Oct 22 '21 03:10 II-VSB-II

Been banging my head against this, I'm surprised it's not thoroughly documented. Make sure boto3 can properly talk to Aws by inspecting the logs:

os.environ["TF_CPP_MIN_LOG_LEVEL"] = "0"
os.environ["TF_CPP_MIN_VLOG_LEVEL"] = "5"
os.environ["AWS_LOG_LEVEL"] = "trace"

I figured boto3 reads the profile settings (and creds) differently from my aws-cli so, while aws s3 ls worked fine, tf.io.gfile.glob was not able to authenticate with AWS. Btw, I also added os.environ['AWS_PROFILE'] = 'my-aws-profile' to make sure boto3 picks up the right profile.

~Having said that, I've managed to get that working with tf==2.6.0 and tf-io=0.20.0 or 0.21.0 but not at all with tf < 2.6 🤔~

Update: Turns out thanks to checking the logs 🙂 with tensorflow==2.5.0 and tensorflow-io==0.19.1 all I need to do is provide region and s3 endpoint manually to have it working fine, while tf==2.6.0 and tf-io=0.20.0 can do without 🤔🤔🤔.

os.environ['AWS_REGION'] = 'eu-west-1'
os.environ['S3_ENDPOINT'] = 'https://s3.eu-west-1.amazonaws.com'

Also, I learned tf.io.gfile.glob just returns an empty list if I didn't import tensorflow_io as tfio

enricorotundo avatar Nov 11 '21 16:11 enricorotundo