kglab
kglab copied to clipboard
access anonymous / public AWS S3 object
With dask I can do
df = dd.read_parquet('s3://bucket/key', storage_options={'anon': True})
and it will work for a public bucket / object on AWS S3
trying
kg = kglab.KnowledgeGraph()
kg.load_parquet('s3://bucket/key', storage_options={'anon': True})
returns: NoCredentialsError: Unable to locate credentials
curious what the way to pass the anon True credentials is.
Great point @fils !
Would it work to wrap these S3 URLs within some of the other libraries for working with them? In the load_parquet method there's support for using:
Although I haven't had a really good use case yet to test with for AWS – much of our testing is on GCP at the moment.
FWIW, we tried to integrate pathy
as well, although had run into some installation problems. If that'd work better, we could revisit pathy
?
@ceteri I likely lack the depth of experience to suggest a path. :)
What little I do know makes me think fsspec sounds interesting. If only since I am learning Dask and there seems to be a relation there?
I could side step this rather easily in many ways. Crudely, I could simply pulling down the parquet and loading locally, or just using my credentials. Anonymous AWS access is perhaps an edge case given the issues it could raise for a data providers wallet.
Our use case is that it might be nice to allow people to explore with some small data without any need for credentials and we have to be using AWS S3... so here we are.
Anonymous access for kg.load_parquet could have its uses. If you have suggestions on a path for now, I'd take any guidance.