earthaccess icon indicating copy to clipboard operation
earthaccess copied to clipboard

[DNM] Allow for opening with known file sizes

Open jrbourbeau opened this issue 2 years ago • 5 comments

The latest release of s3fs allows you to specify the file size ahead of time (if known) when opening an S3 file (this already existed for HTTPS files). This allows s3fs to skip some calls to S3 which can be expensive (especially when opening lots of files). Marking as DNM for now as I'm still experimenting with what performance impacts this has in practice.

jrbourbeau avatar Oct 11 '23 22:10 jrbourbeau

Binder :point_left: Launch a binder notebook on this branch for commit c2bb1cdb092a6f34e97bb5dbe8827f4ea38581b6

I will automatically update this comment whenever this PR is modified

Binder :point_left: Launch a binder notebook on this branch for commit 0b6bb981bc832a1278a93caa3b1c16a38993e4d6

github-actions[bot] avatar Oct 11 '23 22:10 github-actions[bot]

DNM

Surprising to still be seeing new-to-me acronyms for PR statuses :laughing:

What do you think of setting PRs not ready for merge to draft status? I'm fine with whatever, and I like the idea of standardized PR labels. Just curious what everyone else prefers. Are there automations that recognize the Do Not Merge label?

MattF-NSIDC avatar Oct 11 '23 22:10 MattF-NSIDC

It looks good but I'm not sure where these sizes will be coming from? granules have irregular sizes based on what they contain and if we are relaying only on what CMR has, there will be discrepancies. Will it be an issue with fsspec if we use a wrong size? @jrbourbeau

betolink avatar Oct 13 '23 18:10 betolink

Good point. We get the correct sizes when we create fsspec.https or s3fs file objects (they query the data store to get the size). The changes here just make sure that we keep track of that size and, in the case where we switch between https and s3 access and re-open the file, we use the already stored size to make subsequent re-openings faster (no longer need to query the data store for the file size)

jrbourbeau avatar Oct 13 '23 18:10 jrbourbeau

@jrbourbeau, any progress here? I'm marking this PR as Draft for now to help us know which PRs we should be actively reviewing.

chuckwondo avatar Apr 19 '24 14:04 chuckwondo