earthaccess icon indicating copy to clipboard operation
earthaccess copied to clipboard

[BUG] s3 download does not check for existing file

Open meteodave opened this issue 1 year ago • 1 comments

Is this issue already tracked somewhere, or is this a new report?

  • [X] I've reviewed existing issues and couldn't find a duplicate for this problem.

Current Behavior

All objects by s3 earthaccess.download are transferred even if they exist locally.

Expected Behavior

I expected that the s3 earthaccess download would check the object is already available locally (similar to the HTTP) and skip the transfer (e.g., "File MYD04_3K.A2019214.1920.061.2019215152349.hdf already downloaded).

Steps To Reproduce

Compare the earthaccess.download via HTTP versus s3 for a cloud enabled data set.

Environment

- OS:Ubuntu 20.04.6
- Python:3.8.10

Additional Context

No response

meteodave avatar Sep 16 '24 15:09 meteodave

Thank you, @meteodave. I can reproduce.

It looks like earthaccess.Store. _download_file‎ implements a simple check for whether the path exists, which is not done in either earthaccess.Store._get_granules or earthaccess.Store._get_urls when using "direct" access logic.

itcarroll avatar Sep 16 '24 15:09 itcarroll

@itcarroll I am interested to work on this. I am thinking of adding earthaccess.Store. _download_file‎ check for the above two functions you listed. Anything else I need to take care of in your opinion?

Sherwin-14 avatar Apr 19 '25 06:04 Sherwin-14

That would be awesome @Sherwin-14! If you want, #595 is an adjacent issue. Not essential they be done together, but both involve the download API.

itcarroll avatar Apr 21 '25 13:04 itcarroll