staged-recipes
staged-recipes copied to clipboard
Proposed Recipes for GOES-16 and GOES-17 from AWS
Source Dataset
GOES satellites (GOES-16 & GOES-17) provide continuous weather imagery and monitoring of meteorological and space environment data across North America. GOES satellites provide the kind of continuous monitoring necessary for intensive data analysis. They hover continuously over one position on the surface. The satellites orbit high enough to allow for a full-disc view of the Earth. Because they stay above a fixed spot on the surface, they provide a constant vigil for the atmospheric "triggers" for severe weather conditions such as tornadoes, flash floods, hailstorms, and hurricanes. When these conditions develop, the GOES satellites are able to monitor storm development and track their movements.
- Link to the website / online documentation for the data
- https://registry.opendata.aws/noaa-goes/
- https://github.com/awslabs/open-data-docs/tree/main/docs/noaa/noaa-goes16
- The file format (e.g. netCDF, csv): netCDF
- How are the source files organized? (e.g. one file per day): many files per day
- How are the source files accessed: S3 and HTTPS
- provide an example link if possible: https://noaa-goes17.s3.amazonaws.com/ABI-L1b-RadC/2018/240/00/OR_ABI-L1b-RadC-M3C03_G17_s20182400027156_e20182400029527_c20182400029559.nc
- Any special steps required to access the data (e.g. password required): No
Transformation / Alignment / Merging
I believe everything can be stacked into a single massive datacube.
Output Dataset
Zarr or Kerchunk-Zarr
cc @darothen
Some additional details:
- @blaylockbk has a nice download page here
- @blaylockbk also maintains a package at blaylockbk/goes2go/ with programmatic access to the AWS/NOAA serves for this data
I believe everything can be stacked into a single massive datacube.
There's one caveat here which is that the red and blue channels in the visible spectrum are actually double the resolution than the other bands, so you need to account for this in the underlying coordinate system(s) for any catalog which "stacks" the data across time.
Data has a high temporal refresh rate - ~10 minutes for the CONUS and Full Sector imagery.
There are a lot of L2 derivative products but the L1b radiances are the low-hanging fruit here and have significant utility across many, many use cases. Happy to write a few user stories if someone is looking for justification in spending time on this.
It may be worthwhile carving out smaller geographical sectors from the CONUS or Full Sector imagery, given the size of the raw data and downstream use cases.
More thoughts:
- The very high refresh rate is a perfect use case for the appending capability discussed in https://github.com/pangeo-forge/user-stories/issues/5. It would be awesome to make this a near-real-time recipe. But for the shorter term, simply getting a static recipe working would be best.
- Given the massive size of the dataset, we definitely don't want to copy the data. We need a kerchunk recipe.
There's one caveat here which is that the red and blue channels in the visible spectrum are actually double the resolution than the other bands, so you need to account for this in the underlying coordinate system(s) for any catalog which "stacks" the data across time.
The way we can handle this today is by simply having different recipes for the different resolution products, and building them to separate datasets. Would that be accetable?
The way we can handle this today is by simply having different recipes for the different resolution products, and building them to separate datasets. Would that be accetable?
Sounds good! Simple solutions are always best.
There may be interest in tackling this on @GoogleCloudPlatform, too. Tagging @shanecglass and @alxmrs for visibility.