earthaccess
earthaccess copied to clipboard
[DNM] Allow for opening with known file sizes
The latest release of s3fs allows you to specify the file size ahead of time (if known) when opening an S3 file (this already existed for HTTPS files). This allows s3fs to skip some calls to S3 which can be expensive (especially when opening lots of files). Marking as DNM for now as I'm still experimenting with what performance impacts this has in practice.
:point_left: Launch a binder notebook on this branch for commit c2bb1cdb092a6f34e97bb5dbe8827f4ea38581b6
I will automatically update this comment whenever this PR is modified
:point_left: Launch a binder notebook on this branch for commit 0b6bb981bc832a1278a93caa3b1c16a38993e4d6
DNM
Surprising to still be seeing new-to-me acronyms for PR statuses :laughing:
What do you think of setting PRs not ready for merge to draft status? I'm fine with whatever, and I like the idea of standardized PR labels. Just curious what everyone else prefers. Are there automations that recognize the Do Not Merge label?
It looks good but I'm not sure where these sizes will be coming from? granules have irregular sizes based on what they contain and if we are relaying only on what CMR has, there will be discrepancies. Will it be an issue with fsspec if we use a wrong size? @jrbourbeau
Good point. We get the correct sizes when we create fsspec.https or s3fs file objects (they query the data store to get the size). The changes here just make sure that we keep track of that size and, in the case where we switch between https and s3 access and re-open the file, we use the already stored size to make subsequent re-openings faster (no longer need to query the data store for the file size)
@jrbourbeau, any progress here? I'm marking this PR as Draft for now to help us know which PRs we should be actively reviewing.