goes2go icon indicating copy to clipboard operation
goes2go copied to clipboard

Read NetCDF files from S3 bucket without downloading the files in the local system

Open najiya-12 opened this issue 3 years ago • 1 comments

I am looking for one year of ABI-L2-CODC data, however, it will be a very large volume of data using goes2go. data for one month is around 8000 files
image

najiya-12 avatar Jun 05 '22 03:06 najiya-12

Hi @najiya-12, thanks for your question.

goes2go doesn't currently have this capability and I'm not currenlty planning to implement this (pull request is welcome!). goes2go can give you a list of file paths for a time range:

# note: this uses the GOES class from the most recent push to the main branch
from goes2go import GOES
G = GOES(16, "ABI-L2-COD", domain='C')
df = G.timerange('2021-01-01', '2021-02-01', download=False)
image

With that info, xarray does have some capability of reading data directly from S3 without downloading it (see here). (I'm run into errors when I do that ValueError: I/O operation on closed file.; I wonder if that's an issue with my virtual environment or if it's a window problem.)


What you are trying to do sounds like a "big data" problem. I would recommend checking out some other tools like https://github.com/fsspec/kerchunk. They might have the right tooling for your needs.

blaylockbk avatar Jun 05 '22 05:06 blaylockbk