metaflow icon indicating copy to clipboard operation
metaflow copied to clipboard

feature request: maintain directory structure for local cache when using metaflow.s3.get_many

Open ophiry opened this issue 6 years ago • 2 comments

metaflow.s3.get_many (and the other get* methods) will download the files to a local cache dir, but don't maintain the original directory structure.

This is fine when the task needs access to single files at a time (the path can be accessed from the resulting S3Object), but there are use cases where an internal library expects to get a subdirectory with specific structure (like shared parquet datasets)

ophiry avatar Dec 18 '19 07:12 ophiry

Thanks for opening the issue! We will look into it.

savingoyal avatar Dec 18 '19 21:12 savingoyal

https://outerbounds-community.slack.com/archives/C02116BBNTU/p1678430788687489?thread_ts=1677781380.902589&cid=C02116BBNTU

@tuulos You shared this helpful example a few months back, we find ourselves using this quite often. It would be VERY beneficial to have this added to the S3 module.

Is there any reason for not having it? If not, can I give an MR adding this? Basically adding docstring with examples to the above methods and updating docs.

bsridatta avatar Sep 02 '23 18:09 bsridatta