staged-recipes
staged-recipes copied to clipboard
Proposed Recipes for NASA MODIS-COSP data (satellite observations of clouds)
Source Dataset
This data provides satellite observations of targeted at the evaluation of global models, facilitated with the use of synthetic observations ("satellite simulators"). The data are a re-packaging of standard "Level-3" (gridded, aggregated) cloud products produced by NASA's MODIS satellites; data from both instruments (morning/Terra and afternoon/Aqua) are combined. The fields conform to output from the "MODIS simulator," one of several used in the CFMIP Observation Simulator Package (COSP, paper1, paper2). Output from COSP and the MODIS simulator is requested as part of the Cloud Feedbacks Model Intercomparison Project (CFMIP), part of CMIP.
- Data are described at https://ladsweb.modaps.eosdis.nasa.gov/missions-and-measurements/products/MCD06COSP_M3_MODIS.
- The source files are netCDF4, one per month (a daily product is also available). There are 23 groups, each corresponding to a variable. Each group contains a range of statistical quantities. Some groups contain joint histogram with secondary parameters.
- The files are available directly, through OpenDAP (example file) or for staging for download through a GUI.
- The data record begins in 2002 and is updated monthly as new data is acquired.
Transformation / Alignment / Merging
Ideally we would provide several related datasets. One would contain the mean values for many or even all scalar fields. This means extracting the mean value from each group from each file and concatenating the fields in time. A second would be the joint histograms, which need to be extracted and normalized, metadata refined, and also concatenated in time. Since the joint histograms are roughly 50 times as large as each scalar field it may be best to create one dataset per joint histogram (there are about a dozen).
Output Dataset
Given the user community it would be useful to produce netCDF output, perhaps with a kerchunk index. It would be fine to also produce a Zarr or related dataset, perhaps in addition. Whatever the format, the data should be structured so that it's easy to append new data as it is produced.
Thanks for this proposal, @RobertPincus.
If I understand correctly, there are at least two related (but distinct) goals outlined here:
- Creating a mirror of the complete dataset(s) in some cloud optimized format
- Producing various reductions (means, joint histograms) from the complete dataset(s)
Generally, Pangeo Forge is focused on being very good at goal 1: producing optimized mirrors of archival datasets (in complete form). Once this is accomplished, goal 2 becomes much easier, as the data will be staged in manner conducive to scalable parallel computation.
In terms of goal 1 (producing the cloud optimized mirror), I note that programmatic distribution of this data is available via OPeNDAP. This would suggest to me that producing a Zarr dataset may be our best option. Kerchunk would be a more efficient option if the data were already staged as netCDFs. Given that is not the case, producing a kerchunk index would entail storing the dataset as netCDFs on the cloud, and then producing the kerchunk indexes. A less efficient (two step) process as compared to directly creating a Zarr store from the OPeNDAP endpoint.
If starting with creation of a Zarr copy of this dataset is an acceptable starting place, we can use XarrayZarrRecipe
to accomplish this. This recipe class supports OPeNDAP inputs.
Is working on this recipe (a few dozen lines of Python code) something you or someone in your group is interested in? If so I can point to the relevant documentation for getting started. If not, we can open this up to others (myself included, perhaps) to collaborate on this development, though note that this latter option may take a bit longer to get spun up.
Looking forward to bringing this vision to life!
@cisaacstern, thanks very much for this feedback.
As a point of clarification, the data already contains the means, joint histograms, etc. that we want - they are just accessed via netCDF groups.
One wrinkle in the ointment is that the file names contain the date of production. Since we don't know this date a priori it amounts to a quasi-random string. Do you know if there's a way in OpenDAP to specify opening files that match a certain pattern including a wildcard?
I'm open to outputting Zarr; if people want to recycle the recipe to make local netCDF mirrors that'll be easy enough. I don't yet understand if, say, one Zarr object is roughly equivalent to a netCDF file, or if a single object could include many variables.
For a first try you can certainly point me to documentation and I can see how far I can get.
Thanks a lot.
One wrinkle in the ointment is that the file names contain the date of production. Since we don't know this date a priori it amounts to a quasi-random string.
This is a really annoying feature of many datasets. Do we know if the hyrax server exposes a TDS catalog or any other catalog? If so, we could crawl it to populate the FilePattern.
I'll see if I can find out about a TDS catalog. JSON files are provided, at least (top level).
@RobertPincus thanks for the clarification. Here is the documentation on recipe contribution. (This published just this morning, so if anything doesn't make sense, that's my fault! Please let me know if so and I will amend.)
Re: your question about what a Zarr store can represent, a single Zarr store can include as many variables as we want, so long as they exist on the same time dimension.
As you'll see in the linked docs, you'll want to define a Recipe Object (in this case, an XarrayZarrRecipe
), which requires a FilePattern
as input. The FilePattern
itself requires a url format function as input, which is a Python function that can create a valid url path to the source data based on, e.g., a date input.
I've worked out a start for this format function based on the (very helpful!) JSON catalog link you provided:
import pandas as pd
import requests
BASE_URL = "http://ladsweb.modaps.eosdis.nasa.gov"
DATASET_ID = "61/MCD06COSP_M3_MODIS"
dates = pd.date_range("2002-07-01", "2021-07-01", freq="MS") # "MS" for "month start"
def make_url(date):
"""Make an OPeNDAP url for NASA MODIS-COSP data based on an input date.
:param date: A member of the ``pandas.core.indexes.datetimes.DatetimeIndex``
created with ``dates = pd.date_range("2002-07-01", "2021-07-01", freq="MS")``.
"""
day_of_year = date.timetuple().tm_yday
response = requests.get(
f"{BASE_URL}/archive/allData/{DATASET_ID}/{date.year}/{day_of_year}.json"
)
filename = [r["name"] for r in response.json()].pop(0)
return f"{BASE_URL}/opendap/hyrax/allData/{DATASET_ID}/{date.year}/{day_of_year}/{filename}"
This function faithfully reproduces the example url you provided in your first comment on this thread:
url = make_url(dates[0])
print(url)
http://ladsweb.modaps.eosdis.nasa.gov/opendap/hyrax/allData/61/MCD06COSP_M3_MODIS/2002/182/MCD06COSP_M3_MODIS.A2002182.061.2020181145824.nc
However I get an error when trying to open this URL with xarray:
import xarray as xr
ds = xr.open_dataset(url)
syntax error, unexpected WORD_WORD, expecting ';' or ','
context: Attributes { latitude { Float64 _FillValue -999.0000000000000; String units "degrees_north"; } longitude { Float64 _FillValue -999.0000000000000; String units "degrees_east"; } NC_GLOBAL { String YAML_config "grid_settings: gridsize: 1 projection: conformal lat_in: Latitude lon_in: Longitude lat_out: Latitude lon_out: Longitude fill_value: -999variable_settings: - name_in: Solar_Zenith name_out: Solar_Zenith attributes: - name: long_name value: Solar Zenith Angle (Cell to Sun) for Daytime Scenes - name: units value: degrees - name: _FillValue value: -999.0 - name: valid_min value: 0.0 - name: valid_max value: 180.0 - name: scale_factor value: 1.0 - name: add_offset value: 0.0 masks: - Mask_Day - name_in: Solar_Azimuth name_out: Solar_Azimuth attributes: - name: long_name value: Solar Azimuth Angle (Cell to Sun) for Daytime Scenes - name: units value: degrees - name: _FillValue value: -999.0 - name: valid_min value: -180.0 - name: valid_max value: 180.0 - name: scale_factor value: 1.0 - name: add_offset value: 0.0 masks: - Mask_Day - name_in: Sensor_Zenith name_out: Sensor_Zenith attributes: - name: long_name value: Sensor Zenith Angle (Cell to Sensor) for Daytime Scenes - name: units value: degrees - name: _FillValue value: -999.0 - name: valid_min value: 0.0 - name: valid_max value: 180.0 - name: scale_factor value: 1.0 - name: add_offset value: 0.0 masks: - Mask_Day - name_in: Sensor_Azimuth name_out: Sensor_Azimuth attributes: - name: long_name value: Sensor Azimuth Angle (Cell to Sensor) for Daytime Scenes - name: units value: degrees - name: _FillValue value: -999.0 - name: valid_min value: -180.0 - name: valid_max value: 180.0 - name: scale_factor value: 1.0 - name: add_offset value: 0.0 masks: - Mask_Day - name_in: Cloud_Top_Pressure name_out: Cloud_Top_Pressure attributes: - name: long_name value: Cloud Top Pressure for Daytime Scenes - name: units value: mb - name: _FillValue value: -999.0 - name: valid_min value: 1.0 - name: valid_max value: 1100.0 - name: scale_factor value: 1.0 - name: add_offset value: 0.0 masks: - Mask_Day - name_in: Cloud_Fraction name_out: Cloud_Mask_Fraction attributes: - name: long_name value: Cloud Fraction from Cloud Mask for Daytime Scenes - name: units value: none - name: _FillValue value: -999.0 - name: valid_min value: 0.0 - name: valid_max value: 1.0 - name: scale_factor value: 1.0 - name: add_offset value: 0.0 masks: - Mask_Day - name_in: Cloud_Fraction name_out: Cloud_Mask_Fraction_Low attributes: - name: long_name value: Cloud Fraction from Cloud Mask (Low, CTP GE 680 hPa) for Daytime Scenes - name: units value: none - name: _FillValue value: -999.0 - name: valid_min value: 0.0 - name: valid_max value: 1.0 - name: scale_factor value: 1.0 - name: add_offset value: 0.0 masks: - Mask_Day - Mask_Low - name_in: Cloud_Fraction name_out: Cloud_Mask_Fraction_Mid attributes: - name: long_name value: Cloud Fraction from Cloud Mask (Mid, 680 hPa GT CTP GE 440 hPa) for Daytime Scenes - name: units value: none - name: _FillValue value: -999.0 - name: valid_min value: 0.0 - name: valid_max value: 1.0 - name: scale_factor value: 1.0 - name: add_offset value: 0.0 masks: - Mask_Day - Mask_Middle - name_in: Cloud_Fraction name_out: Cloud_Mask_Fraction_High attributes: - name: long_name value: Cloud Fraction from Cloud Mask (High, CTP LT 440 hPa) for Daytime Scenes - name: units value: none - name: _FillValue value: -999.0 - name: valid_min value: 0.0 - name: valid_max value: 1.0 - name: scale_factor value: 1.0 - name: add_offset value: 0.0 masks: - Mask_Day - Mask_High - name_in: Cloud_Optical_Thickness name_out: Cloud_Optical_Thickness_Liquid attributes: - name: long_name value: Cloud Optical Thickness for Liquid Water Clouds (3.7 micron Retrieval for Cloudy Scenes) - name: units value: none - name: _FillValue value: -999.0 - name: valid_min value: 0.0 - name: valid_max value: 150.0 - name: scale_factor value: 1.0 - name: add_offset value: 0.0 2D_histograms: - name_out: JHisto_vs_Cloud_Particle_Size_Liquid primary_var: edges: [0.0, 0.3, 1.3, 3.6, 9.4, 23.0, 60.0, 150.0] joint_var: name_in: Cloud_Effective_Radius edges: [4.0, 8.0, 10.0, 13.0, 15.0, 20.0, 30.0] masks: - Mask_Valid_Range_CER - Mask_Liquid_Water_Phase_Clouds - name_in: Cloud_Optical_Thickness name_out: Cloud_Optical_Thickness_Ice attributes: - name: long_name value: Cloud Optical Thickness for Ice Clouds (3.7 micron Retrieval for Cloudy Scenes) - name: units value: none - name: _FillValue value: -999.0 - name: valid_min value: 0.0 - name: valid_max value: 150.0 - name: scale_factor value: 1.0 - name: add_offset value: 0.0 2D_histograms: - name_out: JHisto_vs_Cloud_Particle_Size_Ice primary_var: edges: [0.0, 0.3, 1.3, 3.6, 9.4, 23.0, 60.0, 150.0] joint_var: name_in: Cloud_Effective_Radius edges: [5.0, 10.0, 20.0, 30.0, 40.0, 50.0, 60.0] masks: - Mask_Ice_Phase_Clouds - name_in: Cloud_Optical_Thickness name_out: Cloud_Optical_Thickness_Total attributes: - name: long_name value: Cloud Optical Thickness for Combined (LiquidWater+Ice+Undetermined) Phase Clouds (3.7 micron Retrieval for Cloudy Scenes) - name: units value: none - name: _FillValue value: -999.0 - name: valid_min value: 0.0 - name: valid_max value: 150.0 - name: scale_factor value: 1.0 - name: add_offset value: 0.0 2D_histograms: - name_out: JHisto_vs_Cloud_Top_Pressure primary_var: edges: [0.0, 0.3, 1.3, 3.6, 9.4, 23.0, 60.0, 150.0] joint_var: name_in: Cloud_Top_Pressure edges: [0.0, 180.0, 310.0, 440.0, 560.0, 680.0, 800.0, 10000.0] masks: - Mask_Valid_Range_CER - Mask_Combined_Phase_Clouds - name_in: Cloud_Optical_Thickness_PCL name_out: Cloud_Optical_Thickness_PCL_Total only_histograms: attributes: - name: long_name value: Cloud Optical Thickness for Combined (LiquidWater+Ice+Undetermined) Phase Clouds (3.7 micron Retrieval for Partly Cloudy (PCL) Scenes) - name: units value: none - name: _FillValue value: -999.0 - name: valid_min value: 0.0 - name: valid_max value: 150.0 - name: scale_factor value: 1.0 - name: add_offset value: 0.0 2D_histograms: - name_out: JHisto_vs_Cloud_Top_Pressure primary_var: edges: [0.0, 0.3, 1.3, 3.6, 9.4, 23.0, 60.0, 150.0] joint_var: name_in: Cloud_Top_Pressure edges: [0.0, 180.0, 310.0, 440.0, 560.0, 680.0, 800.0, 10000.0] masks: - Mask_Valid_Range_CERPCL - Mask_Combined_Phase_Clouds - name_in: Cloud_Optical_Thickness_Log name_out: Cloud_Optical_Thickness_Log10_Liquid attributes: - name: long_name value: Cloud Optical Thickness Log10 for Liquid Water Clouds (3.7 micron Retrieval for Cloudy Scenes) - name: units value: none - name: _FillValue value: -999.0 - name: valid_min value: -2.0 - name: valid_max value: 2.176 - name: scale_factor value: 1.0 - name: add_offset value: 0.0 masks: - Mask_Valid_Range_CER - Mask_Liquid_Water_Phase_Clouds - name_in: Cloud_Optical_Thickness_Log name_out: Cloud_Optical_Thickness_Log10_Ice attributes: - name: long_name value: Cloud Optical Thickness Log10 for Ice Clouds (3.7 micron Retrieval for Cloudy Scenes) - name: units value: none - name: _FillValue value: -999.0 - name: valid_min value: -2.0 - name: valid_max value: 2.176 - name: scale_factor value: 1.0 - name: add_offset value: 0.0 masks: - Mask_Ice_Phase_Clouds - name_in: Cloud_Optical_Thickness_Log name_out: Cloud_Optical_Thickness_Log10_Total attributes: - name: long_name value: Cloud Optical Thickness Log10 for Combined (LiquidWater+Ice+Undetermined) Phase Clouds (3.7 micron Retrieval for Cloudy Scenes) - name: units value: none - name: _FillValue value: -999.0 - name: valid_min value: -2.0 - name: valid_max value: 2.176 - name: scale_factor value: 1.0 - name: add_offset value: 0.0 masks: - Mask_Valid_Range_CER - Mask_Combined_Phase_Clouds - name_in: Cloud_Effective_Radius name_out: Cloud_Particle_Size_Liquid attributes: - name: long_name value: Cloud Effective Radius for Liquid Water Clouds (3.7 micron Retrieval for Cloudy Scenes) - name: units value: microns - name: _FillValue value: -999.0 - name: valid_min value: 4.0 - name: valid_max value: 30.0 - name: scale_factor value: 1.0 - name: add_offset value: 0.0 masks: - Mask_Valid_Range_CER - Mask_Liquid_Water_Phase_Clouds - name_in: Cloud_Effective_Radius name_out: Cloud_Particle_Size_Ice attributes: - name: long_name value: Cloud Effective Radius for Ice Clouds (3.7 micron Retrieval for Cloudy Scenes) - name: units value: microns - name: _FillValue value: -999.0 - name: valid_min value: 5.0 - name: valid_max value: 60.0 - name: scale_factor value: 1.0 - name: add_offset value: 0.0 masks: - Mask_Ice_Phase_Clouds - name_in: Cloud_Water_Path name_out: Cloud_Water_Path_Liquid attributes: - name: long_name value: Cloud Water Path for Liquid Water Clouds (3.7 micron Retrieval for Cloudy Scenes) - name: units value: g/m^2 - name: _FillValue value: -999.0 - name: valid_min value: 0.0 - name: valid_max value: 3000.0 - name: scale_factor value: 1.0 - name: add_offset value: 0.0 masks: - Mask_Valid_Range_CER - Mask_Liquid_Water_Phase_Clouds - name_in: Cloud_Water_Path name_out: Cloud_Water_Path_Ice attributes: - name: long_name value: Cloud Water Path for Ice Clouds (3.7 micron Retrieval for Cloudy Scenes) - name: units value: g/m^2 - name: _FillValue value: -999.0 - name: valid_min value: 0.0 - name: valid_max value: 6000.0 - name: scale_factor value: 1.0 - name: add_offset value: 0.0 masks: - Mask_Ice_Phase_Clouds - name_in: COPR_Liquid name_out: Cloud_Retrieval_Fraction_Liquid attributes: - name: long_name value: Cloud Optical Properties Retrieval Fraction (Liquid Water Clouds) - name: units value: none - name: _FillValue value: -999.0 - name: valid_min value: 0.0 - name: valid_max value: 1.0 - name: scale_factor value: 1.0 - name: add_offset value: 0.0 - name_in: COPR_Ice name_out: Cloud_Retrieval_Fraction_Ice attributes: - name: long_name value: Cloud Optical Properties Retrieval Fraction (Ice Clouds) - name: units value: none - name: _FillValue value: -999.0 - name: valid_min value: 0.0 - name: valid_max value: 1.0 - name: scale_factor value: 1.0 - name: add_offset value: 0.0 - name_in: COPR_Combined name_out: Cloud_Retrieval_Fraction_Total attributes: - name: long_name value: Cloud Optical Properties Retrieval Fraction (Combined (LiquidWater+Ice+Undetermined) Phase Clouds) - name: units value: none - name: _FillValue value: -999.0 - name: valid_min value: 0.0 - name: valid_max value: 1.0 - name: scale_factor value: 1.0 - name: add_offset value: 0.0"; String Yori_version "1.3.16"; String daily_defn_of_day_adjustment "False"; String input_files "MCD06COSP_D3_MODIS.A2002185.061.2020179074148.nc,MCD06COSP_D3_MODIS.A2002186.061.2020179074020.nc,MCD06COSP_D3_MODIS.A2002187.061.2020179080105.nc,MCD06COSP_D3_MODIS.A2002188.061.2020179073800.nc,MCD06COSP_D3_MODIS.A2002189.061.2020179075527.nc,MCD06COSP_D3_MODIS.A2002190.061.2020181140712.nc,MCD06COSP_D3_MODIS.A2002191.061.2020179073354.nc,MCD06COSP_D3_MODIS.A2002192.061.2020181140657.nc,MCD06COSP_D3_MODIS.A2002193.061.2020181140639.nc,MCD06COSP_D3_MODIS.A2002194.061.2020181140633.nc,MCD06COSP_D3_MODIS.A2002195.061.2020179073600.nc,MCD06COSP_D3_MODIS.A2002196.061.2020179071759.nc,MCD06COSP_D3_MODIS.A2002197.061.2020179073136.nc,MCD06COSP_D3_MODIS.A2002198.061.2020181140638.nc,MCD06COSP_D3_MODIS.A2002199.061.2020179073626.nc,MCD06COSP_D3_MODIS.A2002200.061.2020181140632.nc,MCD06COSP_D3_MODIS.A2002201.061.2020181140623.nc,MCD06COSP_D3_MODIS.A2002202.061.2020179073345.nc,MCD06COSP_D3_MODIS.A2002203.061.2020179072223.nc,MCD06COSP_D3_MODIS.A2002204.061.2020179072036.nc,MCD06COSP_D3_MODIS.A2002205.061.2020179074935.nc,MCD06COSP_D3_MODIS.A2002206.061.2020179072758.nc,MCD06COSP_D3_MODIS.A2002207.061.2020179074751.nc,MCD06COSP_D3_MODIS.A2002208.061.2020179074110.nc,MCD06COSP_D3_MODIS.A2002209.061.2020179073958.nc,MCD06COSP_D3_MODIS.A2002210.061.2020181140608.nc,MCD06COSP_D3_MODIS.A2002211.061.2020181140441.nc,MCD06COSP_D3_MODIS.A2002212.061.2020181140457.nc"; String history ""; String source "idl 8.4, mcd06cosp_preyori 20191204-1, yori 1.3.16"; String date_created "2020-06-29T14:58:03Z"; String product_name "MCD06COSP_M3_MODIS.A2002182.061.2020181145824.nc"; String LocalGranuleID "MCD06COSP_M3_MODIS.A2002182.061.2020181145824.nc"; String Conventions "CF-1.6, ACDD-1.3"; String ShortName "MCD06COSP_M3_MODIS"; String product_version "6.1.2"; String AlgorithmType "OPS"; String identifier_product_doi "10.5067/MODIS/MCD06COSP_M3_MODIS.061"; String identifier_product_doi_authority "http://dx.doi.org/"; String ancillary_files ""; String DataCenterId "UWI-MAD/SSEC/ASIPS"; String project "NASA VIIRS Atmosphere SIPS"; String creator_name "NASA VIIRS Atmosphere SIPS"; String creator_url "https://sips.ssec.wisc.edu/"; String creator_email "[email protected]"; String creator_institution "Space Science & Engineering Center, University of Wisconsin - Madison"; String publisher_name "LAADS"; String publisher_url "https://ladsweb.modaps.eosdis.nasa.gov/"; String publisher_email "[email protected]"; String publisher_institution "NASA Level-1 and Atmosphere Archive & Distribution System"; String time_coverage_start "2002-07-01T00:00:00.000000"; String time_coverage_end "2002-07-31T23:59:59.000000"; String xmlmetadata "<?xml version="1.0"^?><!DOCTYPE GranuleMetaDataFile SYSTEM "http://ecsinfo.gsfc.nasa.gov/ECSInfo/ecsmetadata/dtds/DPL/ECS/ScienceGranuleMetadata.dtd"><GranuleMetaDataFile> <DTDVersion>1.0</DTDVersion> <DataCenterId>UWI-MAD/SSEC/ASIPS</DataCenterId> <GranuleURMetaData> <CollectionMetaData> <ShortName>MCD06COSP_M3_MODIS</ShortName> <VersionID>61</VersionID> </CollectionMetaData> <ECSDataGranule> <ReprocessingPlanned>no further reprocessing anticipated</ReprocessingPlanned> <LocalGranuleID>MCD06COSP_M3_MODIS.A2002182.061.2020181145824.nc</LocalGranuleID> <ProductionDateTime>2020-06-29 14:58:49.491586</ProductionDateTime> <LocalVersionID>61</LocalVersionID> </ECSDataGranule> <PGEVersionClass> <PGEVersion>6.1.2</PGEVersion> </PGEVersionClass> <RangeDateTime> <RangeEndingTime>23:59:59.000000</RangeEndingTime> <RangeEndingDate>2002-07-31</RangeEndingDate> <RangeBeginningTime>00:00:00.000000</RangeBeginningTime> <RangeBeginningDate>2002-07-01</RangeBeginningDate> </RangeDateTime> <SpatialDomainContainer> <HorizontalSpatialDomainContainer> <BoundingRectangle> <WestBoundingCoordinate>-180</WestBoundingCoordinate> <NorthBoundingCoordinate>90</NorthBoundingCoordinate> <EastBoundingCoordinate>180</EastBoundingCoordinate> <SouthBoundingCoordinate>-90</SouthBoundingCoordinate> </BoundingRectangle> </HorizontalSpatialDomainContainer> </SpatialDomainContainer> <Platform> <PlatformShortName>Suomi NPP</PlatformShortName> <Instrument> <InstrumentShortName>VIIRS</InstrumentShortName> <Sensor> <SensorShortName>VIIRS</SensorShortName> </Sensor> </Instrument> </Platform> <InputGranule> <InputPointer>MCD06COSP_D3_MODIS.A2002185.061.2020179074148.nc</InputPointer> <InputPointer>MCD06COSP_D3_MODIS.A2002186.061.2020179074020.nc</InputPointer> <InputPointer>MCD06COSP_D3_MODIS.A2002187.061.2020179080105.nc</InputPointer> <InputPointer>MCD06COSP_D3_MODIS.A2002188.061.2020179073800.nc</InputPointer> <InputPointer>MCD06COSP_D3_MODIS.A2002189.061.2020179075527.nc</InputPointer> <InputPointer>MCD06COSP_D3_MODIS.A2002190.061.2020181140712.nc</InputPointer> <InputPointer>MCD06COSP_D3_MODIS.A2002191.061.2020179073354.nc</InputPointer> <InputPointer>MCD06COSP_D3_MODIS.A2002192.061.2020181140657.nc</InputPointer> <InputPointer>MCD06COSP_D3_MODIS.A2002193.061.2020181140639.nc</InputPointer> <InputPointer>MCD06COSP_D3_MODIS.A2002194.061.2020181140633.nc</InputPointer> <InputPointer>MCD06COSP_D3_MODIS.A2002195.061.2020179073600.nc</InputPointer> <InputPointer>MCD06COSP_D3_MODIS.A2002196.061.2020179071759.nc</InputPointer> <InputPointer>MCD06COSP_D3_MODIS.A2002197.061.2020179073136.nc</InputPointer> <InputPointer>MCD06COSP_D3_MODIS.A2002198.061.2020181140638.nc</InputPointer> <InputPointer>MCD06COSP_D3_MODIS.A2002199.061.2020179073626.nc</InputPointer> <InputPointer>MCD06COSP_D3_MODIS.A2002200.061.2020181140632.nc</InputPointer> <InputPointer>MCD06COSP_D3_MODIS.A2002201.061.2020181140623.nc</InputPointer> <InputPointer>MCD06COSP_D3_MODIS.A2002202.061.2020179073345.nc</InputPointer> <InputPointer>MCD06COSP_D3_MODIS.A2002203.061.2020179072223.nc</InputPointer> <InputPointer>MCD06COSP_D3_MODIS.A2002204.061.2020179072036.nc</InputPointer> <InputPointer>MCD06COSP_D3_MODIS.A2002205.061.2020179074935.nc</InputPointer> <InputPointer>MCD06COSP_D3_MODIS.A2002206.061.2020179072758.nc</InputPointer> <InputPointer>MCD06COSP_D3_MODIS.A2002207.061.2020179074751.nc</InputPointer> <InputPointer>MCD06COSP_D3_MODIS.A2002208.061.2020179074110.nc</InputPointer> <InputPointer>MCD06COSP_D3_MODIS.A2002209.061.2020179073958.nc</InputPointer> <InputPointer>MCD06COSP_D3_MODIS.A2002210.061.2020181140608.nc</InputPointer> <InputPointer>MCD06COSP_D3_MODIS.A2002211.061.2020181140441.nc</InputPointer> <InputPointer>MCD06COSP_D3_MODIS.A2002212.061.2020181140457.nc</InputPointer> </InputGranule> <AncillaryInputGranules> </AncillaryInputGranules> </GranuleURMetaData></GranuleMetaDataFile>"; String platform "Aqua, Terra"; String instrument "MODIS"; String processing_level "L3"; String format "NetCDF4"; String title "Aqua/Terra MODIS Cloud Properties Level 3 monthly, 1x1 degree grid (MCD06COSP_M3_MODIS)"; String long_name "MODIS (Aqua/Terra) Cloud Properties Level 3 monthly, 1x1 degree grid"; String version_id "061"; Float64 geospatial_lat_max 90.00000000000000; Float64 geospatial_lat_min -90.00000000000000; Float64 geospatial_lon_min 180.0000000000000; Float64 geospatial_lon_max -180.0000000000000; Float64 NorthBoundingCoordinate 90.00000000000000; Float64 SouthBoundingCoordinate -90.00000000000000; Float64 EastBoundingCoordinate 180.0000000000000; Float64 WestBoundingCoordinate -180.0000000000000; Float64 latitude_resolution 1.000000000000000; Float64 longitude_resolution 1.000000000000000; String license "http://science.nasa.gov/earth-science/earth-science-data/data-information-policy/"; String stdname_vocabulary "NetCDF Climate and Forecast (CF) Metadata Convention"; String keywords_vocabulary "NASA Global Change Master Directory (GCMD) Science Keywords"; String keywords "EARTH SCIENCE > ATMOSPHERE > CLOUDS > CLOUD MICROPHYSICS > CLOUD OPTICAL DEPTH/THICKNESS, EARTH SCIENCE > ATMOSPHERE > CLOUDS > CLOUD PROPERTIES > CLOUD TOP HEIGHT, EARTH SCIENCE > ATMOSPHERE > CLOUDS > CLOUD PROPERTIES > CLOUD FRACTION"; String naming_authority "gov.nasa.gsfc.sci.atmos"; }}
Illegal attribute
context: Attributes { latitude { Float64 _FillValue -999.0000000000000; String units "degrees_north"; } longitude { Float64 _FillValue -999.0000000000000; String units "degrees_east"; } NC_GLOBAL { String YAML_config "grid_settings: gridsize: 1 projection: conformal lat_in: Latitude lon_in: Longitude lat_out: Latitude lon_out: Longitude fill_value: -999variable_settings: - name_in: Solar_Zenith name_out: Solar_Zenith attributes: - name: long_name value: Solar Zenith Angle (Cell to Sun) for Daytime Scenes - name: units value: degrees - name: _FillValue value: -999.0 - name: valid_min value: 0.0 - name: valid_max value: 180.0 - name: scale_factor value: 1.0 - name: add_offset value: 0.0 masks: - Mask_Day - name_in: Solar_Azimuth name_out: Solar_Azimuth attributes: - name: long_name value: Solar Azimuth Angle (Cell to Sun) for Daytime Scenes - name: units value: degrees - name: _FillValue value: -999.0 - name: valid_min value: -180.0 - name: valid_max value: 180.0 - name: scale_factor value: 1.0 - name: add_offset value: 0.0 masks: - Mask_Day - name_in: Sensor_Zenith name_out: Sensor_Zenith attributes: - name: long_name value: Sensor Zenith Angle (Cell to Sensor) for Daytime Scenes - name: units value: degrees - name: _FillValue value: -999.0 - name: valid_min value: 0.0 - name: valid_max value: 180.0 - name: scale_factor value: 1.0 - name: add_offset value: 0.0 masks: - Mask_Day - name_in: Sensor_Azimuth name_out: Sensor_Azimuth attributes: - name: long_name value: Sensor Azimuth Angle (Cell to Sensor) for Daytime Scenes - name: units value: degrees - name: _FillValue value: -999.0 - name: valid_min value: -180.0 - name: valid_max value: 180.0 - name: scale_factor value: 1.0 - name: add_offset value: 0.0 masks: - Mask_Day - name_in: Cloud_Top_Pressure name_out: Cloud_Top_Pressure attributes: - name: long_name value: Cloud Top Pressure for Daytime Scenes - name: units value: mb - name: _FillValue value: -999.0 - name: valid_min value: 1.0 - name: valid_max value: 1100.0 - name: scale_factor value: 1.0 - name: add_offset value: 0.0 masks: - Mask_Day - name_in: Cloud_Fraction name_out: Cloud_Mask_Fraction attributes: - name: long_name value: Cloud Fraction from Cloud Mask for Daytime Scenes - name: units value: none - name: _FillValue value: -999.0 - name: valid_min value: 0.0 - name: valid_max value: 1.0 - name: scale_factor value: 1.0 - name: add_offset value: 0.0 masks: - Mask_Day - name_in: Cloud_Fraction name_out: Cloud_Mask_Fraction_Low attributes: - name: long_name value: Cloud Fraction from Cloud Mask (Low, CTP GE 680 hPa) for Daytime Scenes - name: units value: none - name: _FillValue value: -999.0 - name: valid_min value: 0.0 - name: valid_max value: 1.0 - name: scale_factor value: 1.0 - name: add_offset value: 0.0 masks: - Mask_Day - Mask_Low - name_in: Cloud_Fraction name_out: Cloud_Mask_Fraction_Mid attributes: - name: long_name value: Cloud Fraction from Cloud Mask (Mid, 680 hPa GT CTP GE 440 hPa) for Daytime Scenes - name: units value: none - name: _FillValue value: -999.0 - name: valid_min value: 0.0 - name: valid_max value: 1.0 - name: scale_factor value: 1.0 - name: add_offset value: 0.0 masks: - Mask_Day - Mask_Middle - name_in: Cloud_Fraction name_out: Cloud_Mask_Fraction_High attributes: - name: long_name value: Cloud Fraction from Cloud Mask (High, CTP LT 440 hPa) for Daytime Scenes - name: units value: none - name: _FillValue value: -999.0 - name: valid_min value: 0.0 - name: valid_max value: 1.0 - name: scale_factor value: 1.0 - name: add_offset value: 0.0 masks: - Mask_Day - Mask_High - name_in: Cloud_Optical_Thickness name_out: Cloud_Optical_Thickness_Liquid attributes: - name: long_name value: Cloud Optical Thickness for Liquid Water Clouds (3.7 micron Retrieval for Cloudy Scenes) - name: units value: none - name: _FillValue value: -999.0 - name: valid_min value: 0.0 - name: valid_max value: 150.0 - name: scale_factor value: 1.0 - name: add_offset value: 0.0 2D_histograms: - name_out: JHisto_vs_Cloud_Particle_Size_Liquid primary_var: edges: [0.0, 0.3, 1.3, 3.6, 9.4, 23.0, 60.0, 150.0] joint_var: name_in: Cloud_Effective_Radius edges: [4.0, 8.0, 10.0, 13.0, 15.0, 20.0, 30.0] masks: - Mask_Valid_Range_CER - Mask_Liquid_Water_Phase_Clouds - name_in: Cloud_Optical_Thickness name_out: Cloud_Optical_Thickness_Ice attributes: - name: long_name value: Cloud Optical Thickness for Ice Clouds (3.7 micron Retrieval for Cloudy Scenes) - name: units value: none - name: _FillValue value: -999.0 - name: valid_min value: 0.0 - name: valid_max value: 150.0 - name: scale_factor value: 1.0 - name: add_offset value: 0.0 2D_histograms: - name_out: JHisto_vs_Cloud_Particle_Size_Ice primary_var: edges: [0.0, 0.3, 1.3, 3.6, 9.4, 23.0, 60.0, 150.0] joint_var: name_in: Cloud_Effective_Radius edges: [5.0, 10.0, 20.0, 30.0, 40.0, 50.0, 60.0] masks: - Mask_Ice_Phase_Clouds - name_in: Cloud_Optical_Thickness name_out: Cloud_Optical_Thickness_Total attributes: - name: long_name value: Cloud Optical Thickness for Combined (LiquidWater+Ice+Undetermined) Phase Clouds (3.7 micron Retrieval for Cloudy Scenes) - name: units value: none - name: _FillValue value: -999.0 - name: valid_min value: 0.0 - name: valid_max value: 150.0 - name: scale_factor value: 1.0 - name: add_offset value: 0.0 2D_histograms: - name_out: JHisto_vs_Cloud_Top_Pressure primary_var: edges: [0.0, 0.3, 1.3, 3.6, 9.4, 23.0, 60.0, 150.0] joint_var: name_in: Cloud_Top_Pressure edges: [0.0, 180.0, 310.0, 440.0, 560.0, 680.0, 800.0, 10000.0] masks: - Mask_Valid_Range_CER - Mask_Combined_Phase_Clouds - name_in: Cloud_Optical_Thickness_PCL name_out: Cloud_Optical_Thickness_PCL_Total only_histograms: attributes: - name: long_name value: Cloud Optical Thickness for Combined (LiquidWater+Ice+Undetermined) Phase Clouds (3.7 micron Retrieval for Partly Cloudy (PCL) Scenes) - name: units value: none - name: _FillValue value: -999.0 - name: valid_min value: 0.0 - name: valid_max value: 150.0 - name: scale_factor value: 1.0 - name: add_offset value: 0.0 2D_histograms: - name_out: JHisto_vs_Cloud_Top_Pressure primary_var: edges: [0.0, 0.3, 1.3, 3.6, 9.4, 23.0, 60.0, 150.0] joint_var: name_in: Cloud_Top_Pressure edges: [0.0, 180.0, 310.0, 440.0, 560.0, 680.0, 800.0, 10000.0] masks: - Mask_Valid_Range_CERPCL - Mask_Combined_Phase_Clouds - name_in: Cloud_Optical_Thickness_Log name_out: Cloud_Optical_Thickness_Log10_Liquid attributes: - name: long_name value: Cloud Optical Thickness Log10 for Liquid Water Clouds (3.7 micron Retrieval for Cloudy Scenes) - name: units value: none - name: _FillValue value: -999.0 - name: valid_min value: -2.0 - name: valid_max value: 2.176 - name: scale_factor value: 1.0 - name: add_offset value: 0.0 masks: - Mask_Valid_Range_CER - Mask_Liquid_Water_Phase_Clouds - name_in: Cloud_Optical_Thickness_Log name_out: Cloud_Optical_Thickness_Log10_Ice attributes: - name: long_name value: Cloud Optical Thickness Log10 for Ice Clouds (3.7 micron Retrieval for Cloudy Scenes) - name: units value: none - name: _FillValue value: -999.0 - name: valid_min value: -2.0 - name: valid_max value: 2.176 - name: scale_factor value: 1.0 - name: add_offset value: 0.0 masks: - Mask_Ice_Phase_Clouds - name_in: Cloud_Optical_Thickness_Log name_out: Cloud_Optical_Thickness_Log10_Total attributes: - name: long_name value: Cloud Optical Thickness Log10 for Combined (LiquidWater+Ice+Undetermined) Phase Clouds (3.7 micron Retrieval for Cloudy Scenes) - name: units value: none - name: _FillValue value: -999.0 - name: valid_min value: -2.0 - name: valid_max value: 2.176 - name: scale_factor value: 1.0 - name: add_offset value: 0.0 masks: - Mask_Valid_Range_CER - Mask_Combined_Phase_Clouds - name_in: Cloud_Effective_Radius name_out: Cloud_Particle_Size_Liquid attributes: - name: long_name value: Cloud Effective Radius for Liquid Water Clouds (3.7 micron Retrieval for Cloudy Scenes) - name: units value: microns - name: _FillValue value: -999.0 - name: valid_min value: 4.0 - name: valid_max value: 30.0 - name: scale_factor value: 1.0 - name: add_offset value: 0.0 masks: - Mask_Valid_Range_CER - Mask_Liquid_Water_Phase_Clouds - name_in: Cloud_Effective_Radius name_out: Cloud_Particle_Size_Ice attributes: - name: long_name value: Cloud Effective Radius for Ice Clouds (3.7 micron Retrieval for Cloudy Scenes) - name: units value: microns - name: _FillValue value: -999.0 - name: valid_min value: 5.0 - name: valid_max value: 60.0 - name: scale_factor value: 1.0 - name: add_offset value: 0.0 masks: - Mask_Ice_Phase_Clouds - name_in: Cloud_Water_Path name_out: Cloud_Water_Path_Liquid attributes: - name: long_name value: Cloud Water Path for Liquid Water Clouds (3.7 micron Retrieval for Cloudy Scenes) - name: units value: g/m^2 - name: _FillValue value: -999.0 - name: valid_min value: 0.0 - name: valid_max value: 3000.0 - name: scale_factor value: 1.0 - name: add_offset value: 0.0 masks: - Mask_Valid_Range_CER - Mask_Liquid_Water_Phase_Clouds - name_in: Cloud_Water_Path name_out: Cloud_Water_Path_Ice attributes: - name: long_name value: Cloud Water Path for Ice Clouds (3.7 micron Retrieval for Cloudy Scenes) - name: units value: g/m^2 - name: _FillValue value: -999.0 - name: valid_min value: 0.0 - name: valid_max value: 6000.0 - name: scale_factor value: 1.0 - name: add_offset value: 0.0 masks: - Mask_Ice_Phase_Clouds - name_in: COPR_Liquid name_out: Cloud_Retrieval_Fraction_Liquid attributes: - name: long_name value: Cloud Optical Properties Retrieval Fraction (Liquid Water Clouds) - name: units value: none - name: _FillValue value: -999.0 - name: valid_min value: 0.0 - name: valid_max value: 1.0 - name: scale_factor value: 1.0 - name: add_offset value: 0.0 - name_in: COPR_Ice name_out: Cloud_Retrieval_Fraction_Ice attributes: - name: long_name value: Cloud Optical Properties Retrieval Fraction (Ice Clouds) - name: units value: none - name: _FillValue value: -999.0 - name: valid_min value: 0.0 - name: valid_max value: 1.0 - name: scale_factor value: 1.0 - name: add_offset value: 0.0 - name_in: COPR_Combined name_out: Cloud_Retrieval_Fraction_Total attributes: - name: long_name value: Cloud Optical Properties Retrieval Fraction (Combined (LiquidWater+Ice+Undetermined) Phase Clouds) - name: units value: none - name: _FillValue value: -999.0 - name: valid_min value: 0.0 - name: valid_max value: 1.0 - name: scale_factor value: 1.0 - name: add_offset value: 0.0"; String Yori_version "1.3.16"; String daily_defn_of_day_adjustment "False"; String input_files "MCD06COSP_D3_MODIS.A2002185.061.2020179074148.nc,MCD06COSP_D3_MODIS.A2002186.061.2020179074020.nc,MCD06COSP_D3_MODIS.A2002187.061.2020179080105.nc,MCD06COSP_D3_MODIS.A2002188.061.2020179073800.nc,MCD06COSP_D3_MODIS.A2002189.061.2020179075527.nc,MCD06COSP_D3_MODIS.A2002190.061.2020181140712.nc,MCD06COSP_D3_MODIS.A2002191.061.2020179073354.nc,MCD06COSP_D3_MODIS.A2002192.061.2020181140657.nc,MCD06COSP_D3_MODIS.A2002193.061.2020181140639.nc,MCD06COSP_D3_MODIS.A2002194.061.2020181140633.nc,MCD06COSP_D3_MODIS.A2002195.061.2020179073600.nc,MCD06COSP_D3_MODIS.A2002196.061.2020179071759.nc,MCD06COSP_D3_MODIS.A2002197.061.2020179073136.nc,MCD06COSP_D3_MODIS.A2002198.061.2020181140638.nc,MCD06COSP_D3_MODIS.A2002199.061.2020179073626.nc,MCD06COSP_D3_MODIS.A2002200.061.2020181140632.nc,MCD06COSP_D3_MODIS.A2002201.061.2020181140623.nc,MCD06COSP_D3_MODIS.A2002202.061.2020179073345.nc,MCD06COSP_D3_MODIS.A2002203.061.2020179072223.nc,MCD06COSP_D3_MODIS.A2002204.061.2020179072036.nc,MCD06COSP_D3_MODIS.A2002205.061.2020179074935.nc,MCD06COSP_D3_MODIS.A2002206.061.2020179072758.nc,MCD06COSP_D3_MODIS.A2002207.061.2020179074751.nc,MCD06COSP_D3_MODIS.A2002208.061.2020179074110.nc,MCD06COSP_D3_MODIS.A2002209.061.2020179073958.nc,MCD06COSP_D3_MODIS.A2002210.061.2020181140608.nc,MCD06COSP_D3_MODIS.A2002211.061.2020181140441.nc,MCD06COSP_D3_MODIS.A2002212.061.2020181140457.nc"; String history ""; String source "idl 8.4, mcd06cosp_preyori 20191204-1, yori 1.3.16"; String date_created "2020-06-29T14:58:03Z"; String product_name "MCD06COSP_M3_MODIS.A2002182.061.2020181145824.nc"; String LocalGranuleID "MCD06COSP_M3_MODIS.A2002182.061.2020181145824.nc"; String Conventions "CF-1.6, ACDD-1.3"; String ShortName "MCD06COSP_M3_MODIS"; String product_version "6.1.2"; String AlgorithmType "OPS"; String identifier_product_doi "10.5067/MODIS/MCD06COSP_M3_MODIS.061"; String identifier_product_doi_authority "http://dx.doi.org/"; String ancillary_files ""; String DataCenterId "UWI-MAD/SSEC/ASIPS"; String project "NASA VIIRS Atmosphere SIPS"; String creator_name "NASA VIIRS Atmosphere SIPS"; String creator_url "https://sips.ssec.wisc.edu/"; String creator_email "[email protected]"; String creator_institution "Space Science & Engineering Center, University of Wisconsin - Madison"; String publisher_name "LAADS"; String publisher_url "https://ladsweb.modaps.eosdis.nasa.gov/"; String publisher_email "[email protected]"; String publisher_institution "NASA Level-1 and Atmosphere Archive & Distribution System"; String time_coverage_start "2002-07-01T00:00:00.000000"; String time_coverage_end "2002-07-31T23:59:59.000000"; String xmlmetadata "<?xml version="1.0"^?><!DOCTYPE GranuleMetaDataFile SYSTEM "http://ecsinfo.gsfc.nasa.gov/ECSInfo/ecsmetadata/dtds/DPL/ECS/ScienceGranuleMetadata.dtd"><GranuleMetaDataFile> <DTDVersion>1.0</DTDVersion> <DataCenterId>UWI-MAD/SSEC/ASIPS</DataCenterId> <GranuleURMetaData> <CollectionMetaData> <ShortName>MCD06COSP_M3_MODIS</ShortName> <VersionID>61</VersionID> </CollectionMetaData> <ECSDataGranule> <ReprocessingPlanned>no further reprocessing anticipated</ReprocessingPlanned> <LocalGranuleID>MCD06COSP_M3_MODIS.A2002182.061.2020181145824.nc</LocalGranuleID> <ProductionDateTime>2020-06-29 14:58:49.491586</ProductionDateTime> <LocalVersionID>61</LocalVersionID> </ECSDataGranule> <PGEVersionClass> <PGEVersion>6.1.2</PGEVersion> </PGEVersionClass> <RangeDateTime> <RangeEndingTime>23:59:59.000000</RangeEndingTime> <RangeEndingDate>2002-07-31</RangeEndingDate> <RangeBeginningTime>00:00:00.000000</RangeBeginningTime> <RangeBeginningDate>2002-07-01</RangeBeginningDate> </RangeDateTime> <SpatialDomainContainer> <HorizontalSpatialDomainContainer> <BoundingRectangle> <WestBoundingCoordinate>-180</WestBoundingCoordinate> <NorthBoundingCoordinate>90</NorthBoundingCoordinate> <EastBoundingCoordinate>180</EastBoundingCoordinate> <SouthBoundingCoordinate>-90</SouthBoundingCoordinate> </BoundingRectangle> </HorizontalSpatialDomainContainer> </SpatialDomainContainer> <Platform> <PlatformShortName>Suomi NPP</PlatformShortName> <Instrument> <InstrumentShortName>VIIRS</InstrumentShortName> <Sensor> <SensorShortName>VIIRS</SensorShortName> </Sensor> </Instrument> </Platform> <InputGranule> <InputPointer>MCD06COSP_D3_MODIS.A2002185.061.2020179074148.nc</InputPointer> <InputPointer>MCD06COSP_D3_MODIS.A2002186.061.2020179074020.nc</InputPointer> <InputPointer>MCD06COSP_D3_MODIS.A2002187.061.2020179080105.nc</InputPointer> <InputPointer>MCD06COSP_D3_MODIS.A2002188.061.2020179073800.nc</InputPointer> <InputPointer>MCD06COSP_D3_MODIS.A2002189.061.2020179075527.nc</InputPointer> <InputPointer>MCD06COSP_D3_MODIS.A2002190.061.2020181140712.nc</InputPointer> <InputPointer>MCD06COSP_D3_MODIS.A2002191.061.2020179073354.nc</InputPointer> <InputPointer>MCD06COSP_D3_MODIS.A2002192.061.2020181140657.nc</InputPointer> <InputPointer>MCD06COSP_D3_MODIS.A2002193.061.2020181140639.nc</InputPointer> <InputPointer>MCD06COSP_D3_MODIS.A2002194.061.2020181140633.nc</InputPointer> <InputPointer>MCD06COSP_D3_MODIS.A2002195.061.2020179073600.nc</InputPointer> <InputPointer>MCD06COSP_D3_MODIS.A2002196.061.2020179071759.nc</InputPointer> <InputPointer>MCD06COSP_D3_MODIS.A2002197.061.2020179073136.nc</InputPointer> <InputPointer>MCD06COSP_D3_MODIS.A2002198.061.2020181140638.nc</InputPointer> <InputPointer>MCD06COSP_D3_MODIS.A2002199.061.2020179073626.nc</InputPointer> <InputPointer>MCD06COSP_D3_MODIS.A2002200.061.2020181140632.nc</InputPointer> <InputPointer>MCD06COSP_D3_MODIS.A2002201.061.2020181140623.nc</InputPointer> <InputPointer>MCD06COSP_D3_MODIS.A2002202.061.2020179073345.nc</InputPointer> <InputPointer>MCD06COSP_D3_MODIS.A2002203.061.2020179072223.nc</InputPointer> <InputPointer>MCD06COSP_D3_MODIS.A2002204.061.2020179072036.nc</InputPointer> <InputPointer>MCD06COSP_D3_MODIS.A2002205.061.2020179074935.nc</InputPointer> <InputPointer>MCD06COSP_D3_MODIS.A2002206.061.2020179072758.nc</InputPointer> <InputPointer>MCD06COSP_D3_MODIS.A2002207.061.2020179074751.nc</InputPointer> <InputPointer>MCD06COSP_D3_MODIS.A2002208.061.2020179074110.nc</InputPointer> <InputPointer>MCD06COSP_D3_MODIS.A2002209.061.2020179073958.nc</InputPointer> <InputPointer>MCD06COSP_D3_MODIS.A2002210.061.2020181140608.nc</InputPointer> <InputPointer>MCD06COSP_D3_MODIS.A2002211.061.2020181140441.nc</InputPointer> <InputPointer>MCD06COSP_D3_MODIS.A2002212.061.2020181140457.nc</InputPointer> </InputGranule> <AncillaryInputGranules> </AncillaryInputGranules> </GranuleURMetaData></GranuleMetaDataFile>"; String platform "Aqua, Terra"; String instrument "MODIS"; String processing_level "L3"; String format "NetCDF4"; String title "Aqua/Terra MODIS Cloud Properties Level 3 monthly, 1x1 degree grid (MCD06COSP_M3_MODIS)"; String long_name "MODIS (Aqua/Terra) Cloud Properties Level 3 monthly, 1x1 degree grid"; String version_id "061"; Float64 geospatial_lat_max 90.00000000000000; Float64 geospatial_lat_min -90.00000000000000; Float64 geospatial_lon_min 180.0000000000000; Float64 geospatial_lon_max -180.0000000000000; Float64 NorthBoundingCoordinate 90.00000000000000; Float64 SouthBoundingCoordinate -90.00000000000000; Float64 EastBoundingCoordinate 180.0000000000000; Float64 WestBoundingCoordinate -180.0000000000000; Float64 latitude_resolution 1.000000000000000; Float64 longitude_resolution 1.000000000000000; String license "http://science.nasa.gov/earth-science/earth-science-data/data-information-policy/"; String stdname_vocabulary "NetCDF Climate and Forecast (CF) Metadata Convention"; String keywords_vocabulary "NASA Global Change Master Directory (GCMD) Science Keywords"; String keywords "EARTH SCIENCE > ATMOSPHERE > CLOUDS > CLOUD MICROPHYSICS > CLOUD OPTICAL DEPTH/THICKNESS, EARTH SCIENCE > ATMOSPHERE > CLOUDS > CLOUD PROPERTIES > CLOUD TOP HEIGHT, EARTH SCIENCE > ATMOSPHERE > CLOUDS > CLOUD PROPERTIES > CLOUD FRACTION"; String naming_authority "gov.nasa.gsfc.sci.atmos"; }}
And the resulting dataset has no variables:
print(ds)
<xarray.Dataset>
Dimensions: (latitude: 180, longitude: 360)
Coordinates:
* latitude (latitude) float64 -89.5 -88.5 -87.5 -86.5 ... 87.5 88.5 89.5
* longitude (longitude) float64 -179.5 -178.5 -177.5 ... 177.5 178.5 179.5
Data variables:
*empty*
Perhaps I am missing some essential keyword argument(s) for xr.open_dataset
?
@cisaacstern Thanks for this. I'll catch up later this week, but meanwhile, perhaps you can try with engine=netcdf4
keywords to xr.open_dataset
?
This error message is coming from the netCDF4 C library.
import netCDF4
url = 'http://ladsweb.modaps.eosdis.nasa.gov/opendap/hyrax/allData/61/MCD06COSP_M3_MODIS/2002/182/MCD06COSP_M3_MODIS.A2002182.061.2020181145824.nc'
ds = netCDF4.Dataset(url, "r")
This means that the Hyrax server is emitting data that cannot be properly parsed by the official Unidata netCDF4 library. This is a problem with the server and needs to be brought to the attention of the NASA system administrator.
Is there a direct link to netCDF file download (rather than OPeNDAP endpoint)?
One can access the files through a GUI by appending .dmr.html
. That provides a button where one can download the data in several formats, but I haven't been able to see the underlying URLs yet.
That website - https://ladsweb.modaps.eosdis.nasa.gov/opendap/hyrax/allData/61/MCD06COSP_M3_MODIS/2002/182/MCD06COSP_M3_MODIS.A2002182.061.2020181145824.nc.dmr.html - does not show any data variables either, just lon and lat.

To get variables one has to open a group, i.e. Cloud_Optical_Thickness_Liquid
I cannot discover any groups from that opendap url.
import netCDF4
url = 'http://ladsweb.modaps.eosdis.nasa.gov/opendap/hyrax/allData/61/MCD06COSP_M3_MODIS/2002/182/MCD06COSP_M3_MODIS.A2002182.061.2020181145824.nc'
ds = netCDF4.Dataset(url, "r")
print(ds.groups) # --> {}
Where are you inputting the group information when you access the data?
My attention is a little split these days, sorry. Like you both, I have been unable to open the files remotely via OpenDAP. I will see what I can learn from NASA but they have not been very responsive. I will also see if I can sleuth out direct download links, which I have not been able to find anywhere obvious.
Once the files is downloaded I've been able to see data with e.g.
import array as xr
file = 'MCD06COSP_M3_MODIS.A2002182.061.2020181145824.nc'
f = xr.open_dataset(file, engine='netcdf4', group='Cloud_Mask_Fraction')
Now the server is returning a 500 server error
https://ladsweb.modaps.eosdis.nasa.gov/opendap/hyrax/allData/61/MCD06COSP_M3_MODIS/2002/182/MCD06COSP_M3_MODIS.A2002182.061.2020181145824.nc.dmr.html

The html page is back up. Inspecting the source there reveals that appending .dap.nc4
is the path to direct download:
<input type="button" value="Get as NetCDF 4" onclick="getAs_button_action('NetCDF-4 Data', '.dap.nc4')">
Amending the earlier make_url
function accordingly, I can now download the source files. What appear to be the group names ('Cloud_Mask_Fraction'
, etc.) are discoverable in the dataset's ds.YAML_config
attribute, but none of these names are openable as groups using the syntax provided in https://github.com/pangeo-forge/staged-recipes/issues/125#issuecomment-1075838942:
import fsspec
import pandas as pd
import requests
import xarray as xr
import yaml
BASE_URL = "http://ladsweb.modaps.eosdis.nasa.gov"
DATASET_ID = "61/MCD06COSP_M3_MODIS"
dates = pd.date_range("2002-07-01", "2021-07-01", freq="MS") # "MS" for "month start"
def make_url(date):
"""Make a NetCDF4 download url for NASA MODIS-COSP data based on an input date.
:param date: A member of the ``pandas.core.indexes.datetimes.DatetimeIndex``
created with ``dates = pd.date_range("2002-07-01", "2021-07-01", freq="MS")``.
"""
day_of_year = date.timetuple().tm_yday
response = requests.get(
f"{BASE_URL}/archive/allData/{DATASET_ID}/{date.year}/{day_of_year}.json"
)
filename = [r["name"] for r in response.json()].pop(0)
return f"{BASE_URL}/opendap/hyrax/allData/{DATASET_ID}/{date.year}/{day_of_year}/{filename}.dap.nc4"
test_filename = "test.nc"
with fsspec.open(make_url(dates[0])) as src:
with open(test_filename, mode="wb") as dst:
dst.write(src.read())
ds = xr.open_dataset(test_filename, engine='netcdf4')
yaml_config = yaml.safe_load(ds.YAML_config)
group_name_pairs = [(v["name_in"], v["name_out"]) for v in yaml_config["variable_settings"]]
for pair in group_name_pairs:
for group in pair:
try:
ds = xr.open_dataset(test_filename, engine='netcdf4', group=group)
except OSError as e:
print(e)
[Errno group not found: Solar_Zenith] 'Solar_Zenith'
[Errno group not found: Solar_Zenith] 'Solar_Zenith'
[Errno group not found: Solar_Azimuth] 'Solar_Azimuth'
[Errno group not found: Solar_Azimuth] 'Solar_Azimuth'
[Errno group not found: Sensor_Zenith] 'Sensor_Zenith'
[Errno group not found: Sensor_Zenith] 'Sensor_Zenith'
[Errno group not found: Sensor_Azimuth] 'Sensor_Azimuth'
[Errno group not found: Sensor_Azimuth] 'Sensor_Azimuth'
[Errno group not found: Cloud_Top_Pressure] 'Cloud_Top_Pressure'
[Errno group not found: Cloud_Top_Pressure] 'Cloud_Top_Pressure'
[Errno group not found: Cloud_Fraction] 'Cloud_Fraction'
[Errno group not found: Cloud_Mask_Fraction] 'Cloud_Mask_Fraction'
[Errno group not found: Cloud_Fraction] 'Cloud_Fraction'
[Errno group not found: Cloud_Mask_Fraction_Low] 'Cloud_Mask_Fraction_Low'
[Errno group not found: Cloud_Fraction] 'Cloud_Fraction'
[Errno group not found: Cloud_Mask_Fraction_Mid] 'Cloud_Mask_Fraction_Mid'
[Errno group not found: Cloud_Fraction] 'Cloud_Fraction'
[Errno group not found: Cloud_Mask_Fraction_High] 'Cloud_Mask_Fraction_High'
[Errno group not found: Cloud_Optical_Thickness] 'Cloud_Optical_Thickness'
[Errno group not found: Cloud_Optical_Thickness_Liquid] 'Cloud_Optical_Thickness_Liquid'
[Errno group not found: Cloud_Optical_Thickness] 'Cloud_Optical_Thickness'
[Errno group not found: Cloud_Optical_Thickness_Ice] 'Cloud_Optical_Thickness_Ice'
[Errno group not found: Cloud_Optical_Thickness] 'Cloud_Optical_Thickness'
[Errno group not found: Cloud_Optical_Thickness_Total] 'Cloud_Optical_Thickness_Total'
[Errno group not found: Cloud_Optical_Thickness_PCL] 'Cloud_Optical_Thickness_PCL'
[Errno group not found: Cloud_Optical_Thickness_PCL_Total] 'Cloud_Optical_Thickness_PCL_Total'
[Errno group not found: Cloud_Optical_Thickness_Log] 'Cloud_Optical_Thickness_Log'
[Errno group not found: Cloud_Optical_Thickness_Log10_Liquid] 'Cloud_Optical_Thickness_Log10_Liquid'
[Errno group not found: Cloud_Optical_Thickness_Log] 'Cloud_Optical_Thickness_Log'
[Errno group not found: Cloud_Optical_Thickness_Log10_Ice] 'Cloud_Optical_Thickness_Log10_Ice'
[Errno group not found: Cloud_Optical_Thickness_Log] 'Cloud_Optical_Thickness_Log'
[Errno group not found: Cloud_Optical_Thickness_Log10_Total] 'Cloud_Optical_Thickness_Log10_Total'
[Errno group not found: Cloud_Effective_Radius] 'Cloud_Effective_Radius'
[Errno group not found: Cloud_Particle_Size_Liquid] 'Cloud_Particle_Size_Liquid'
[Errno group not found: Cloud_Effective_Radius] 'Cloud_Effective_Radius'
[Errno group not found: Cloud_Particle_Size_Ice] 'Cloud_Particle_Size_Ice'
[Errno group not found: Cloud_Water_Path] 'Cloud_Water_Path'
[Errno group not found: Cloud_Water_Path_Liquid] 'Cloud_Water_Path_Liquid'
[Errno group not found: Cloud_Water_Path] 'Cloud_Water_Path'
[Errno group not found: Cloud_Water_Path_Ice] 'Cloud_Water_Path_Ice'
[Errno group not found: COPR_Liquid] 'COPR_Liquid'
[Errno group not found: Cloud_Retrieval_Fraction_Liquid] 'Cloud_Retrieval_Fraction_Liquid'
[Errno group not found: COPR_Ice] 'COPR_Ice'
[Errno group not found: Cloud_Retrieval_Fraction_Ice] 'Cloud_Retrieval_Fraction_Ice'
[Errno group not found: COPR_Combined] 'COPR_Combined'
[Errno group not found: Cloud_Retrieval_Fraction_Total] 'Cloud_Retrieval_Fraction_Total'
Here's the full YAML config:
{'grid_settings': {'gridsize': 1,
'projection': 'conformal',
'lat_in': 'Latitude',
'lon_in': 'Longitude',
'lat_out': 'Latitude',
'lon_out': 'Longitude',
'fill_value': -999},
'variable_settings': [{'name_in': 'Solar_Zenith',
'name_out': 'Solar_Zenith',
'attributes': [{'name': 'long_name',
'value': 'Solar Zenith Angle (Cell to Sun) for Daytime Scenes'},
{'name': 'units', 'value': 'degrees'},
{'name': '_FillValue', 'value': -999.0},
{'name': 'valid_min', 'value': 0.0},
{'name': 'valid_max', 'value': 180.0},
{'name': 'scale_factor', 'value': 1.0},
{'name': 'add_offset', 'value': 0.0}],
'masks': ['Mask_Day']},
{'name_in': 'Solar_Azimuth',
'name_out': 'Solar_Azimuth',
'attributes': [{'name': 'long_name',
'value': 'Solar Azimuth Angle (Cell to Sun) for Daytime Scenes'},
{'name': 'units', 'value': 'degrees'},
{'name': '_FillValue', 'value': -999.0},
{'name': 'valid_min', 'value': -180.0},
{'name': 'valid_max', 'value': 180.0},
{'name': 'scale_factor', 'value': 1.0},
{'name': 'add_offset', 'value': 0.0}],
'masks': ['Mask_Day']},
{'name_in': 'Sensor_Zenith',
'name_out': 'Sensor_Zenith',
'attributes': [{'name': 'long_name',
'value': 'Sensor Zenith Angle (Cell to Sensor) for Daytime Scenes'},
{'name': 'units', 'value': 'degrees'},
{'name': '_FillValue', 'value': -999.0},
{'name': 'valid_min', 'value': 0.0},
{'name': 'valid_max', 'value': 180.0},
{'name': 'scale_factor', 'value': 1.0},
{'name': 'add_offset', 'value': 0.0}],
'masks': ['Mask_Day']},
{'name_in': 'Sensor_Azimuth',
'name_out': 'Sensor_Azimuth',
'attributes': [{'name': 'long_name',
'value': 'Sensor Azimuth Angle (Cell to Sensor) for Daytime Scenes'},
{'name': 'units', 'value': 'degrees'},
{'name': '_FillValue', 'value': -999.0},
{'name': 'valid_min', 'value': -180.0},
{'name': 'valid_max', 'value': 180.0},
{'name': 'scale_factor', 'value': 1.0},
{'name': 'add_offset', 'value': 0.0}],
'masks': ['Mask_Day']},
{'name_in': 'Cloud_Top_Pressure',
'name_out': 'Cloud_Top_Pressure',
'attributes': [{'name': 'long_name',
'value': 'Cloud Top Pressure for Daytime Scenes'},
{'name': 'units', 'value': 'mb'},
{'name': '_FillValue', 'value': -999.0},
{'name': 'valid_min', 'value': 1.0},
{'name': 'valid_max', 'value': 1100.0},
{'name': 'scale_factor', 'value': 1.0},
{'name': 'add_offset', 'value': 0.0}],
'masks': ['Mask_Day']},
{'name_in': 'Cloud_Fraction',
'name_out': 'Cloud_Mask_Fraction',
'attributes': [{'name': 'long_name',
'value': 'Cloud Fraction from Cloud Mask for Daytime Scenes'},
{'name': 'units', 'value': 'none'},
{'name': '_FillValue', 'value': -999.0},
{'name': 'valid_min', 'value': 0.0},
{'name': 'valid_max', 'value': 1.0},
{'name': 'scale_factor', 'value': 1.0},
{'name': 'add_offset', 'value': 0.0}],
'masks': ['Mask_Day']},
{'name_in': 'Cloud_Fraction',
'name_out': 'Cloud_Mask_Fraction_Low',
'attributes': [{'name': 'long_name',
'value': 'Cloud Fraction from Cloud Mask (Low, CTP GE 680 hPa) for Daytime Scenes'},
{'name': 'units', 'value': 'none'},
{'name': '_FillValue', 'value': -999.0},
{'name': 'valid_min', 'value': 0.0},
{'name': 'valid_max', 'value': 1.0},
{'name': 'scale_factor', 'value': 1.0},
{'name': 'add_offset', 'value': 0.0}],
'masks': ['Mask_Day', 'Mask_Low']},
{'name_in': 'Cloud_Fraction',
'name_out': 'Cloud_Mask_Fraction_Mid',
'attributes': [{'name': 'long_name',
'value': 'Cloud Fraction from Cloud Mask (Mid, 680 hPa GT CTP GE 440 hPa) for Daytime Scenes'},
{'name': 'units', 'value': 'none'},
{'name': '_FillValue', 'value': -999.0},
{'name': 'valid_min', 'value': 0.0},
{'name': 'valid_max', 'value': 1.0},
{'name': 'scale_factor', 'value': 1.0},
{'name': 'add_offset', 'value': 0.0}],
'masks': ['Mask_Day', 'Mask_Middle']},
{'name_in': 'Cloud_Fraction',
'name_out': 'Cloud_Mask_Fraction_High',
'attributes': [{'name': 'long_name',
'value': 'Cloud Fraction from Cloud Mask (High, CTP LT 440 hPa) for Daytime Scenes'},
{'name': 'units', 'value': 'none'},
{'name': '_FillValue', 'value': -999.0},
{'name': 'valid_min', 'value': 0.0},
{'name': 'valid_max', 'value': 1.0},
{'name': 'scale_factor', 'value': 1.0},
{'name': 'add_offset', 'value': 0.0}],
'masks': ['Mask_Day', 'Mask_High']},
{'name_in': 'Cloud_Optical_Thickness',
'name_out': 'Cloud_Optical_Thickness_Liquid',
'attributes': [{'name': 'long_name',
'value': 'Cloud Optical Thickness for Liquid Water Clouds (3.7 micron Retrieval for Cloudy Scenes)'},
{'name': 'units', 'value': 'none'},
{'name': '_FillValue', 'value': -999.0},
{'name': 'valid_min', 'value': 0.0},
{'name': 'valid_max', 'value': 150.0},
{'name': 'scale_factor', 'value': 1.0},
{'name': 'add_offset', 'value': 0.0}],
'2D_histograms': [{'name_out': 'JHisto_vs_Cloud_Particle_Size_Liquid',
'primary_var': {'edges': [0.0, 0.3, 1.3, 3.6, 9.4, 23.0, 60.0, 150.0]},
'joint_var': {'name_in': 'Cloud_Effective_Radius',
'edges': [4.0, 8.0, 10.0, 13.0, 15.0, 20.0, 30.0]}}],
'masks': ['Mask_Valid_Range_CER', 'Mask_Liquid_Water_Phase_Clouds']},
{'name_in': 'Cloud_Optical_Thickness',
'name_out': 'Cloud_Optical_Thickness_Ice',
'attributes': [{'name': 'long_name',
'value': 'Cloud Optical Thickness for Ice Clouds (3.7 micron Retrieval for Cloudy Scenes)'},
{'name': 'units', 'value': 'none'},
{'name': '_FillValue', 'value': -999.0},
{'name': 'valid_min', 'value': 0.0},
{'name': 'valid_max', 'value': 150.0},
{'name': 'scale_factor', 'value': 1.0},
{'name': 'add_offset', 'value': 0.0}],
'2D_histograms': [{'name_out': 'JHisto_vs_Cloud_Particle_Size_Ice',
'primary_var': {'edges': [0.0, 0.3, 1.3, 3.6, 9.4, 23.0, 60.0, 150.0]},
'joint_var': {'name_in': 'Cloud_Effective_Radius',
'edges': [5.0, 10.0, 20.0, 30.0, 40.0, 50.0, 60.0]}}],
'masks': ['Mask_Ice_Phase_Clouds']},
{'name_in': 'Cloud_Optical_Thickness',
'name_out': 'Cloud_Optical_Thickness_Total',
'attributes': [{'name': 'long_name',
'value': 'Cloud Optical Thickness for Combined (LiquidWater+Ice+Undetermined) Phase Clouds (3.7 micron Retrieval for Cloudy Scenes)'},
{'name': 'units', 'value': 'none'},
{'name': '_FillValue', 'value': -999.0},
{'name': 'valid_min', 'value': 0.0},
{'name': 'valid_max', 'value': 150.0},
{'name': 'scale_factor', 'value': 1.0},
{'name': 'add_offset', 'value': 0.0}],
'2D_histograms': [{'name_out': 'JHisto_vs_Cloud_Top_Pressure',
'primary_var': {'edges': [0.0, 0.3, 1.3, 3.6, 9.4, 23.0, 60.0, 150.0]},
'joint_var': {'name_in': 'Cloud_Top_Pressure',
'edges': [0.0, 180.0, 310.0, 440.0, 560.0, 680.0, 800.0, 10000.0]}}],
'masks': ['Mask_Valid_Range_CER', 'Mask_Combined_Phase_Clouds']},
{'name_in': 'Cloud_Optical_Thickness_PCL',
'name_out': 'Cloud_Optical_Thickness_PCL_Total',
'only_histograms': None,
'attributes': [{'name': 'long_name',
'value': 'Cloud Optical Thickness for Combined (LiquidWater+Ice+Undetermined) Phase Clouds (3.7 micron Retrieval for Partly Cloudy (PCL) Scenes)'},
{'name': 'units', 'value': 'none'},
{'name': '_FillValue', 'value': -999.0},
{'name': 'valid_min', 'value': 0.0},
{'name': 'valid_max', 'value': 150.0},
{'name': 'scale_factor', 'value': 1.0},
{'name': 'add_offset', 'value': 0.0}],
'2D_histograms': [{'name_out': 'JHisto_vs_Cloud_Top_Pressure',
'primary_var': {'edges': [0.0, 0.3, 1.3, 3.6, 9.4, 23.0, 60.0, 150.0]},
'joint_var': {'name_in': 'Cloud_Top_Pressure',
'edges': [0.0, 180.0, 310.0, 440.0, 560.0, 680.0, 800.0, 10000.0]}}],
'masks': ['Mask_Valid_Range_CERPCL', 'Mask_Combined_Phase_Clouds']},
{'name_in': 'Cloud_Optical_Thickness_Log',
'name_out': 'Cloud_Optical_Thickness_Log10_Liquid',
'attributes': [{'name': 'long_name',
'value': 'Cloud Optical Thickness Log10 for Liquid Water Clouds (3.7 micron Retrieval for Cloudy Scenes)'},
{'name': 'units', 'value': 'none'},
{'name': '_FillValue', 'value': -999.0},
{'name': 'valid_min', 'value': -2.0},
{'name': 'valid_max', 'value': 2.176},
{'name': 'scale_factor', 'value': 1.0},
{'name': 'add_offset', 'value': 0.0}],
'masks': ['Mask_Valid_Range_CER', 'Mask_Liquid_Water_Phase_Clouds']},
{'name_in': 'Cloud_Optical_Thickness_Log',
'name_out': 'Cloud_Optical_Thickness_Log10_Ice',
'attributes': [{'name': 'long_name',
'value': 'Cloud Optical Thickness Log10 for Ice Clouds (3.7 micron Retrieval for Cloudy Scenes)'},
{'name': 'units', 'value': 'none'},
{'name': '_FillValue', 'value': -999.0},
{'name': 'valid_min', 'value': -2.0},
{'name': 'valid_max', 'value': 2.176},
{'name': 'scale_factor', 'value': 1.0},
{'name': 'add_offset', 'value': 0.0}],
'masks': ['Mask_Ice_Phase_Clouds']},
{'name_in': 'Cloud_Optical_Thickness_Log',
'name_out': 'Cloud_Optical_Thickness_Log10_Total',
'attributes': [{'name': 'long_name',
'value': 'Cloud Optical Thickness Log10 for Combined (LiquidWater+Ice+Undetermined) Phase Clouds (3.7 micron Retrieval for Cloudy Scenes)'},
{'name': 'units', 'value': 'none'},
{'name': '_FillValue', 'value': -999.0},
{'name': 'valid_min', 'value': -2.0},
{'name': 'valid_max', 'value': 2.176},
{'name': 'scale_factor', 'value': 1.0},
{'name': 'add_offset', 'value': 0.0}],
'masks': ['Mask_Valid_Range_CER', 'Mask_Combined_Phase_Clouds']},
{'name_in': 'Cloud_Effective_Radius',
'name_out': 'Cloud_Particle_Size_Liquid',
'attributes': [{'name': 'long_name',
'value': 'Cloud Effective Radius for Liquid Water Clouds (3.7 micron Retrieval for Cloudy Scenes)'},
{'name': 'units', 'value': 'microns'},
{'name': '_FillValue', 'value': -999.0},
{'name': 'valid_min', 'value': 4.0},
{'name': 'valid_max', 'value': 30.0},
{'name': 'scale_factor', 'value': 1.0},
{'name': 'add_offset', 'value': 0.0}],
'masks': ['Mask_Valid_Range_CER', 'Mask_Liquid_Water_Phase_Clouds']},
{'name_in': 'Cloud_Effective_Radius',
'name_out': 'Cloud_Particle_Size_Ice',
'attributes': [{'name': 'long_name',
'value': 'Cloud Effective Radius for Ice Clouds (3.7 micron Retrieval for Cloudy Scenes)'},
{'name': 'units', 'value': 'microns'},
{'name': '_FillValue', 'value': -999.0},
{'name': 'valid_min', 'value': 5.0},
{'name': 'valid_max', 'value': 60.0},
{'name': 'scale_factor', 'value': 1.0},
{'name': 'add_offset', 'value': 0.0}],
'masks': ['Mask_Ice_Phase_Clouds']},
{'name_in': 'Cloud_Water_Path',
'name_out': 'Cloud_Water_Path_Liquid',
'attributes': [{'name': 'long_name',
'value': 'Cloud Water Path for Liquid Water Clouds (3.7 micron Retrieval for Cloudy Scenes)'},
{'name': 'units', 'value': 'g/m^2'},
{'name': '_FillValue', 'value': -999.0},
{'name': 'valid_min', 'value': 0.0},
{'name': 'valid_max', 'value': 3000.0},
{'name': 'scale_factor', 'value': 1.0},
{'name': 'add_offset', 'value': 0.0}],
'masks': ['Mask_Valid_Range_CER', 'Mask_Liquid_Water_Phase_Clouds']},
{'name_in': 'Cloud_Water_Path',
'name_out': 'Cloud_Water_Path_Ice',
'attributes': [{'name': 'long_name',
'value': 'Cloud Water Path for Ice Clouds (3.7 micron Retrieval for Cloudy Scenes)'},
{'name': 'units', 'value': 'g/m^2'},
{'name': '_FillValue', 'value': -999.0},
{'name': 'valid_min', 'value': 0.0},
{'name': 'valid_max', 'value': 6000.0},
{'name': 'scale_factor', 'value': 1.0},
{'name': 'add_offset', 'value': 0.0}],
'masks': ['Mask_Ice_Phase_Clouds']},
{'name_in': 'COPR_Liquid',
'name_out': 'Cloud_Retrieval_Fraction_Liquid',
'attributes': [{'name': 'long_name',
'value': 'Cloud Optical Properties Retrieval Fraction (Liquid Water Clouds)'},
{'name': 'units', 'value': 'none'},
{'name': '_FillValue', 'value': -999.0},
{'name': 'valid_min', 'value': 0.0},
{'name': 'valid_max', 'value': 1.0},
{'name': 'scale_factor', 'value': 1.0},
{'name': 'add_offset', 'value': 0.0}]},
{'name_in': 'COPR_Ice',
'name_out': 'Cloud_Retrieval_Fraction_Ice',
'attributes': [{'name': 'long_name',
'value': 'Cloud Optical Properties Retrieval Fraction (Ice Clouds)'},
{'name': 'units', 'value': 'none'},
{'name': '_FillValue', 'value': -999.0},
{'name': 'valid_min', 'value': 0.0},
{'name': 'valid_max', 'value': 1.0},
{'name': 'scale_factor', 'value': 1.0},
{'name': 'add_offset', 'value': 0.0}]},
{'name_in': 'COPR_Combined',
'name_out': 'Cloud_Retrieval_Fraction_Total',
'attributes': [{'name': 'long_name',
'value': 'Cloud Optical Properties Retrieval Fraction (Combined (LiquidWater+Ice+Undetermined) Phase Clouds)'},
{'name': 'units', 'value': 'none'},
{'name': '_FillValue', 'value': -999.0},
{'name': 'valid_min', 'value': 0.0},
{'name': 'valid_max', 'value': 1.0},
{'name': 'scale_factor', 'value': 1.0},
{'name': 'add_offset', 'value': 0.0}]}]}
@cisaacstern This part of the code is supposed to create a copy?
with fsspec.open(make_url(dates[0])) as src:
with open(test_filename, mode="wb") as dst:
dst.write(src.read())
Because the file created is much smaller than the original:
% ls -lt *.nc
-rw-r--r--@ 1 robert staff 47668 Mar 23 15:32 test.nc
-rw-r--r--@ 1 robert staff 40091481 Sep 21 2021 MCD06COSP_M3_MODIS.A2021182.061.2021250210032.nc
Yes, that's the code block which aims to download the file.
How did you get this 40 MB MCD06COSP_M3_MODIS.A2021182.061.2021250210032.nc
?
When I navigate to the GUI at
https://ladsweb.modaps.eosdis.nasa.gov/opendap/hyrax/allData/61/MCD06COSP_M3_MODIS/2002/182/MCD06COSP_M3_MODIS.A2002182.061.2020181145824.nc.dmr.html
and click Get as NetCDF 4 the file my web browser downloads is 47668 bytes
➜ Downloads ls -lt *.nc4
-rw-r--r--@ 1 charlesstern staff 47668 Mar 23 12:56 MCD06COSP_M3_MODIS.A2002182.061.2020181145824.nc.nc4
which is the same size as the test.nc
retrieved by that code block.
Looks like your 40 MB MCD06COSP_M3_MODIS.A2021182.061.2021250210032.nc
was downloaded last September? Perhaps this Hyrax server is truly just not working right now, as Ryan previously hypothesized?
... hmm on closer reading your file has an updated_at
slug of 2021250210032
whereas somehow I'm pointing at 2020181145824
which is an older version... I'm going to look into that now.
The better comparison would be to
-rw-r--r--@ 1 robert staff 68513011 Mar 1 14:00 MCD06COSP_M3_MODIS.A2021182.061.2022052174444.nc
Thanks for these helpful clarifications re: expected data size, Robert. I've made considerable headway with both file retrieval and a draft of the recipes themselves. Buckle up for a longish but hopefully useful post.
Exploring the LAADS DAAC website a bit turned up the HTTP file service, e.g.
https://ladsweb.modaps.eosdis.nasa.gov/archive/allData/61/MCD06COSP_M3_MODIS/2002/182/
demonstrates a wget
example using the authentication option
wget ... --header "Authorization: Bearer INSERT_DOWNLOAD_TOKEN_HERE"
After generating a token according to these instructions and exporting it as the EARTHDATA_TOKEN
env variable, this authentication style can be adapted to download a complete file via fsspec
as follows
import os
import fsspec
base_url = (
"https://ladsweb.modaps.eosdis.nasa.gov/"
"archive/allData/61/MCD06COSP_M3_MODIS/2002/182"
)
filename = "MCD06COSP_M3_MODIS.A2002182.061.2020181145824.nc"
with fsspec.open(
f"{base_url}/{filename}",
client_kwargs=dict(headers=dict(Authorization=f"Bearer {os.environ['EARTHDATA_TOKEN']}")),
) as src:
with open(filename, mode="wb") as dst:
dst.write(src.read())
The resulting file is has an openable group for each of the names provided in its ds.YAML_config
import xarray as xr
import yaml
ds = xr.open_dataset(filename)
yaml_config = yaml.safe_load(ds.YAML_config)
group_names = [v["name_out"] for v in yaml_config["variable_settings"]]
has_groups = []
for group in group_names:
try:
ds = xr.open_dataset(filename, group=group)
except OSError as e:
print(e)
else:
has_groups.append(group)
print(has_groups)
['Solar_Zenith', 'Solar_Azimuth', 'Sensor_Zenith', 'Sensor_Azimuth', 'Cloud_Top_Pressure', 'Cloud_Mask_Fraction', 'Cloud_Mask_Fraction_Low', 'Cloud_Mask_Fraction_Mid', 'Cloud_Mask_Fraction_High', 'Cloud_Optical_Thickness_Liquid', 'Cloud_Optical_Thickness_Ice', 'Cloud_Optical_Thickness_Total', 'Cloud_Optical_Thickness_PCL_Total', 'Cloud_Optical_Thickness_Log10_Liquid', 'Cloud_Optical_Thickness_Log10_Ice', 'Cloud_Optical_Thickness_Log10_Total', 'Cloud_Particle_Size_Liquid', 'Cloud_Particle_Size_Ice', 'Cloud_Water_Path_Liquid', 'Cloud_Water_Path_Ice', 'Cloud_Retrieval_Fraction_Liquid', 'Cloud_Retrieval_Fraction_Ice', 'Cloud_Retrieval_Fraction_Total']
With this file access knowledge in hand, we can write a dictionary containing a naive XarrayZarrRecipe
for each group as follows.
Note: Each of these recipes concatenates the given group into a time series spanning all months covered in the
dates
sequence. To make this possible, I define aprocess_input
function which adds the"date"
dimension to each group, because as provided by LAADS DAAC the groups do not have any temporal dimension along which to concatenate.
import os
import pandas as pd
import requests
from pangeo_forge_recipes.patterns import ConcatDim, FilePattern
from pangeo_forge_recipes.recipes import XarrayZarrRecipe
GROUPS = [
'Solar_Zenith',
'Solar_Azimuth',
'Sensor_Zenith',
'Sensor_Azimuth',
'Cloud_Top_Pressure',
'Cloud_Mask_Fraction',
'Cloud_Mask_Fraction_Low',
'Cloud_Mask_Fraction_Mid',
'Cloud_Mask_Fraction_High',
'Cloud_Optical_Thickness_Liquid',
'Cloud_Optical_Thickness_Ice',
'Cloud_Optical_Thickness_Total',
'Cloud_Optical_Thickness_PCL_Total',
'Cloud_Optical_Thickness_Log10_Liquid',
'Cloud_Optical_Thickness_Log10_Ice',
'Cloud_Optical_Thickness_Log10_Total',
'Cloud_Particle_Size_Liquid',
'Cloud_Particle_Size_Ice',
'Cloud_Water_Path_Liquid',
'Cloud_Water_Path_Ice',
'Cloud_Retrieval_Fraction_Liquid',
'Cloud_Retrieval_Fraction_Ice',
'Cloud_Retrieval_Fraction_Total',
]
BASE_URL = "https://ladsweb.modaps.eosdis.nasa.gov/archive/allData/61/MCD06COSP_M3_MODIS"
dates = pd.date_range("2002-07-01", "2021-07-01", freq="MS") # "MS" for "month start"
concat_dim = ConcatDim("date", keys=dates, nitems_per_file=1)
def make_url(date):
"""Make a NetCDF4 download url for NASA MODIS-COSP data based on an input date.
:param date: A member of the ``pandas.core.indexes.datetimes.DatetimeIndex``
created with ``dates = pd.date_range("2002-07-01", "2021-07-01", freq="MS")``.
"""
day_of_year = date.timetuple().tm_yday
response = requests.get(f"{BASE_URL}/{date.year}/{day_of_year}.json")
filename = [r["name"] for r in response.json()].pop(0)
return f"{BASE_URL}/{date.year}/{day_of_year}/{filename}"
pattern = FilePattern(
make_url,
concat_dim,
fsspec_open_kwargs={
"client_kwargs": dict(headers=dict(Authorization=f"Bearer {os.environ['EARTHDATA_TOKEN']}"))
},
)
def process_input(ds, filename):
"""Add missing "date" dimension to dataset to facilitate concatenation.
"""
import xarray as xr
return xr.concat([ds], dim="date")
per_group_recipes = {
group: XarrayZarrRecipe(
pattern,
xarray_open_kwargs=dict(group=group),
process_input=process_input,
)
for group in GROUPS
}
We cannot execute these recipes on Pangeo Forge Cloud yet, because we don't yet have a mechanism to securely manage credentials (xref https://github.com/pangeo-forge/roadmap/pull/36). However, I did execute a 2-month temporal subset of each of these recipes locally (and anyone else can too) with the following code:
NOTE: Running the code below will create 23 new subdirectories (i.e. Zarr stores, which are directories) within the current working directory.
from fsspec.implementations.local import LocalFileSystem
from pangeo_forge_recipes.recipes import setup_logging
from pangeo_forge_recipes.storage import CacheFSSpecTarget, FSSpecTarget
fs_local = LocalFileSystem()
setup_logging("DEBUG")
for group_name, recipe in per_group_recipes.items():
print(f"\n\n Building {group_name} onto local storage...")
recipe.storage_config.cache = CacheFSSpecTarget(fs_local, "cache")
recipe.storage_config.target = FSSpecTarget(fs_local, group_name + ".zarr")
recipe_pruned = recipe.copy_pruned()
recipe_pruned.to_function()()
and the resulting Zarr stores (one for each group) can be accessed with
import xarray as xr
ds = xr.open_zarr(f"{group_name}.zarr", consolidated=True)
by way of conclusion, for now, based on this test I'd estimate the full temporal scope of each of these recipes to build Zarr stores of between ~ 1.1 and 12.8 GB per group, with a total dataset (consisting of a full temporal run for each of the 23 groups) size of about 69 GB:
all_groups_full_size = 0
for group in GROUPS:
ds = xr.open_zarr(f"{group}.zarr", consolidated=True)
group_pruned_size = round(ds.nbytes/1e6)
group_full_size = group_pruned_size * len(dates)
print(f"{group} {group_pruned_size} MB -> {group_full_size/1e3} GB")
all_groups_full_size += group_full_size
print(f"\n{all_groups_full_size/1e3} GB")
Solar_Zenith 5 MB -> 1.145 GB
Solar_Azimuth 5 MB -> 1.145 GB
Sensor_Zenith 5 MB -> 1.145 GB
Sensor_Azimuth 5 MB -> 1.145 GB
Cloud_Top_Pressure 5 MB -> 1.145 GB
Cloud_Mask_Fraction 5 MB -> 1.145 GB
Cloud_Mask_Fraction_Low 5 MB -> 1.145 GB
Cloud_Mask_Fraction_Mid 5 MB -> 1.145 GB
Cloud_Mask_Fraction_High 5 MB -> 1.145 GB
Cloud_Optical_Thickness_Liquid 49 MB -> 11.221 GB
Cloud_Optical_Thickness_Ice 49 MB -> 11.221 GB
Cloud_Optical_Thickness_Total 56 MB -> 12.824 GB
Cloud_Optical_Thickness_PCL_Total 51 MB -> 11.679 GB
Cloud_Optical_Thickness_Log10_Liquid 5 MB -> 1.145 GB
Cloud_Optical_Thickness_Log10_Ice 5 MB -> 1.145 GB
Cloud_Optical_Thickness_Log10_Total 5 MB -> 1.145 GB
Cloud_Particle_Size_Liquid 5 MB -> 1.145 GB
Cloud_Particle_Size_Ice 5 MB -> 1.145 GB
Cloud_Water_Path_Liquid 5 MB -> 1.145 GB
Cloud_Water_Path_Ice 5 MB -> 1.145 GB
Cloud_Retrieval_Fraction_Liquid 5 MB -> 1.145 GB
Cloud_Retrieval_Fraction_Ice 5 MB -> 1.145 GB
Cloud_Retrieval_Fraction_Total 5 MB -> 1.145 GB
68.7 GB
@cisaacstern Thanks so much for continuing to work on this; it's spectacular.
I'm not sure how y'all think of things at Pangeo-forge but, from a science user's perspective, there's a lot to be gained by more targeted processing. (By way of background, for some groups we want to extract only one field of four; for other groups we want to do some arithmetic on existing fields.)
My understanding is that I should create a set of dictionary containing a set of XarrayZaarRecipies
, where each process_input
keyword points to the appropriate function? For example, I might have extract_selected_fields
which creates a dataset from the Mean
variable from a set of groups (renamed to the group name, so Cloud_Top_Pressure.Mean
becomes Cloud_Top_Pressure
)? And the recipes that share input files will not download the files over and over?
Is there a way to handle appending new data as it is produced, month by month?
Question: do these groups contain variables with the same dimensions / coordinates? If so, it would make sense logically to merge them into a single dataset. (That is not possible today but would become possible with the Opener refactor.)
All variables share location and time coordinates. I would package all the scalar fields together in a single dataset. There are also some joint histograms with the same location and time coordinates but different histogram bins. Because they don't share bin definitions, and because they're large, I had though to create separate datasets for each unique set of bin definitions.
There is no inhenernt size limit to the zarr group, because it is not a single file. It's all about doing whatever is most convenient for the person analyzing the data. In this case, it sounds like we want just one big dataset.
As long as the dimensions use distinct names, we should be fine to merge into a single dataset. I.e. bins: 50
and bins: 70
would cause merge errors, but Cloud_Water_Path_Liquid_bins: 50
and Cloud_Retrieval_Fraction_Ice_bins: 70
would be fine.
We cannot execute these recipes on Pangeo Forge Cloud yet, because we don't yet have a mechanism to securely manage credentials
Charles, I wonder if it is worthwhile to just special case earthdata login and inject some earthdata login credentials directly into our environments. This would allow us to move forward with some of these recipes before we solve the general secrets problem.
Yes, merging is definitely the way to go. As Ryan said, we'll need https://github.com/pangeo-forge/pangeo-forge-recipes/pull/245 to do this in a single recipe, but we can do it today in two steps, which I've done to complete the end-to-end demonstration.
-
I exported the outputs of each of the recipes in my last comment with
ds.to_netcdf
and cached those files to our OSN bucket at these publicly accessible paths:https://ncsa.osn.xsede.org/Pangeo/pangeo-forge/modis-cosp/cache/Solar_Zenith.nc https://ncsa.osn.xsede.org/Pangeo/pangeo-forge/modis-cosp/cache/Solar_Azimuth.nc https://ncsa.osn.xsede.org/Pangeo/pangeo-forge/modis-cosp/cache/Sensor_Zenith.nc https://ncsa.osn.xsede.org/Pangeo/pangeo-forge/modis-cosp/cache/Sensor_Azimuth.nc https://ncsa.osn.xsede.org/Pangeo/pangeo-forge/modis-cosp/cache/Cloud_Top_Pressure.nc https://ncsa.osn.xsede.org/Pangeo/pangeo-forge/modis-cosp/cache/Cloud_Mask_Fraction.nc https://ncsa.osn.xsede.org/Pangeo/pangeo-forge/modis-cosp/cache/Cloud_Mask_Fraction_Low.nc https://ncsa.osn.xsede.org/Pangeo/pangeo-forge/modis-cosp/cache/Cloud_Mask_Fraction_Mid.nc https://ncsa.osn.xsede.org/Pangeo/pangeo-forge/modis-cosp/cache/Cloud_Mask_Fraction_High.nc https://ncsa.osn.xsede.org/Pangeo/pangeo-forge/modis-cosp/cache/Cloud_Optical_Thickness_Liquid.nc https://ncsa.osn.xsede.org/Pangeo/pangeo-forge/modis-cosp/cache/Cloud_Optical_Thickness_Ice.nc https://ncsa.osn.xsede.org/Pangeo/pangeo-forge/modis-cosp/cache/Cloud_Optical_Thickness_Total.nc https://ncsa.osn.xsede.org/Pangeo/pangeo-forge/modis-cosp/cache/Cloud_Optical_Thickness_PCL_Total.nc https://ncsa.osn.xsede.org/Pangeo/pangeo-forge/modis-cosp/cache/Cloud_Optical_Thickness_Log10_Liquid.nc https://ncsa.osn.xsede.org/Pangeo/pangeo-forge/modis-cosp/cache/Cloud_Optical_Thickness_Log10_Ice.nc https://ncsa.osn.xsede.org/Pangeo/pangeo-forge/modis-cosp/cache/Cloud_Optical_Thickness_Log10_Total.nc https://ncsa.osn.xsede.org/Pangeo/pangeo-forge/modis-cosp/cache/Cloud_Particle_Size_Liquid.nc https://ncsa.osn.xsede.org/Pangeo/pangeo-forge/modis-cosp/cache/Cloud_Particle_Size_Ice.nc https://ncsa.osn.xsede.org/Pangeo/pangeo-forge/modis-cosp/cache/Cloud_Water_Path_Liquid.nc https://ncsa.osn.xsede.org/Pangeo/pangeo-forge/modis-cosp/cache/Cloud_Water_Path_Ice.nc https://ncsa.osn.xsede.org/Pangeo/pangeo-forge/modis-cosp/cache/Cloud_Retrieval_Fraction_Liquid.nc https://ncsa.osn.xsede.org/Pangeo/pangeo-forge/modis-cosp/cache/Cloud_Retrieval_Fraction_Ice.nc https://ncsa.osn.xsede.org/Pangeo/pangeo-forge/modis-cosp/cache/Cloud_Retrieval_Fraction_Total.nc
-
I wrote a second recipe to merge these inputs into a single Zarr store:
from pangeo_forge_recipes.patterns import ConcatDim, FilePattern, MergeDim from pangeo_forge_recipes.recipes import XarrayZarrRecipe concat_dim = ConcatDim("date", keys=[0,], nitems_per_file=2) # Here `GROUPS` is the list defined in: # https://github.com/pangeo-forge/staged-recipes/issues/125#issuecomment-1077053600 merge_dim = MergeDim("group", keys=GROUPS) def make_url(date, group): base_url = "https://ncsa.osn.xsede.org/Pangeo/pangeo-forge" return f"{base_url}/modis-cosp/cache/{group}.nc" def process_input(ds, filename): """Add a group name abbreviation to each data variable name. """ group = filename.split("/modis-cosp/cache/")[-1].replace(".nc", "") abbreviation = ( "".join([word[0] for word in group.split("_")]) # e.g. 'Cloud_Top_Pressure' -> 'CTP' if not group.startswith("S") # special casing to disambiguate 'Solar_*' & 'Sensor_*' else group[:3] + group.split("_")[-1][0] # e.g. 'Solar_Zenith' -> 'SolZ'; 'Sensor_Zenith' -> 'SenZ' ) return ds.rename_vars({v: f"{abbreviation}_{v}" for v in ds.data_vars}) pattern = FilePattern(make_url, concat_dim, merge_dim) recipe = XarrayZarrRecipe(pattern, process_input=process_input)
-
I ran this recipe locally and then manually copied the output to our OSN bucket. The resulting Zarr store (2 time steps, 114 data variables) can be opened with:
import fsspec import xarray as xr base_url = "https://ncsa.osn.xsede.org/Pangeo/pangeo-forge" dataset_public_url = f"{base_url}/modis-cosp/modis-cosp-demo.zarr" mapper = fsspec.get_mapper(dataset_public_url) ds = xr.open_zarr(mapper, consolidated=True) print(ds)
<xarray.Dataset> Dimensions: (date: 2, longitude: 360, latitude: 180, jhisto_cloud_optical_thickness_ice_7: 7, jhisto_cloud_particle_size_ice_6: 6, jhisto_cloud_optical_thickness_liquid_7: 7, jhisto_cloud_particle_size_liquid_6: 6, jhisto_cloud_optical_thickness_pcl_total_7: 7, jhisto_cloud_top_pressure_7: 7, jhisto_cloud_optical_thickness_total_7: 7) Dimensions without coordinates: date, longitude, latitude, jhisto_cloud_optical_thickness_ice_7, jhisto_cloud_particle_size_ice_6, jhisto_cloud_optical_thickness_liquid_7, jhisto_cloud_particle_size_liquid_6, jhisto_cloud_optical_thickness_pcl_total_7, jhisto_cloud_top_pressure_7, jhisto_cloud_optical_thickness_total_7 Data variables: (12/114) CMFH_Mean (date, longitude, latitude) float64 dask.array<chunksize=(2, 360, 180), meta=np.ndarray> CMFH_Pixel_Counts (date, longitude, latitude) float64 dask.array<chunksize=(2, 360, 180), meta=np.ndarray> CMFH_Standard_Deviation (date, longitude, latitude) float64 dask.array<chunksize=(2, 360, 180), meta=np.ndarray> CMFH_Sum (date, longitude, latitude) float64 dask.array<chunksize=(2, 360, 180), meta=np.ndarray> CMFH_Sum_Squares (date, longitude, latitude) float64 dask.array<chunksize=(2, 360, 180), meta=np.ndarray> CMFL_Mean (date, longitude, latitude) float64 dask.array<chunksize=(2, 360, 180), meta=np.ndarray> ... ... SolA_Sum_Squares (date, longitude, latitude) float64 dask.array<chunksize=(2, 360, 180), meta=np.ndarray> SolZ_Mean (date, longitude, latitude) float64 dask.array<chunksize=(2, 360, 180), meta=np.ndarray> SolZ_Pixel_Counts (date, longitude, latitude) float64 dask.array<chunksize=(2, 360, 180), meta=np.ndarray> SolZ_Standard_Deviation (date, longitude, latitude) float64 dask.array<chunksize=(2, 360, 180), meta=np.ndarray> SolZ_Sum (date, longitude, latitude) float64 dask.array<chunksize=(2, 360, 180), meta=np.ndarray> SolZ_Sum_Squares (date, longitude, latitude) float64 dask.array<chunksize=(2, 360, 180), meta=np.ndarray> Attributes: _FillValue: -999.0 add_offset: 0.0 long_name: Cloud Optical Properties Retrieval Fraction (Combined (Liq... scale_factor: 1.0 units: none valid_max: 1.0 valid_min: 0.0
I'll respond to the other questions/comments in another comment.
here's a lot to be gained by more targeted processing. ... My understanding is that I should create a set of dictionary containing a set of XarrayZarrRecipes, where each process_input keyword points to the appropriate function?
Correct. As described in the API Reference, process_input
functions must have the signature
def process_input(ds: xr.Dataset, filename: str) -> ds: xr.Dataset
so to use the group name (for renaming variables, etc.) within process_input
, you'll need to get it either from the filename (as I did above) or perhaps ds.attrs["long_name"]
. And yes, you can apply any arithmetic, etc. within this function as well, and then just return the ds
as you'd like it to appear in the recipe's output dataset.
I agree that a great next step would be for you to refine the per-group recipes I prototyped in my earlier comment so that the per-group Zarr stores they output look as you'd like them to. (Merging all these together will become a lot simpler once the above-referenced refactor is complete, so we don't need to worry about that for now.)
As you go along, you can run local tests of your recipes as described in the Running a Recipe Locally docs. Once you hit a point where you have questions, rather than posting your code in comments as I've done here, I'd recommend Making a PR, which will make it easier for me to clone and work with your code.
And the recipes that share input files will not download the files over and over?
Once we've put everything together into one recipe, yes this will be true. In the interim, while we still have a single recipe for each group, that won't happen automatically, because each recipe maintains its own cache. If you get to a point where this becomes a barrier to recipe development, just let me know and I can show you some advanced config to point all of the recipes to a single cache. I'd recommend trying to execute a few recipes first before we get into that, though.
Is there a way to handle appending new data as it is produced, month by month?
This is on the roadmap (xref https://github.com/pangeo-forge/pangeo-forge-recipes/issues/37) but for now the solution for this would be to just overwrite the original dataset with an updated date range once new data is released. For this particular dataset, that does not concern me too much, because the entire dataset is less than a 100 GB, which is on the low end of what our infrastructure is designed to handle, so re-writing the whole thing should be relatively fast (a few hours, maybe).
I wonder if it is worthwhile to just special case earthdata login and inject some earthdata login credentials directly into our environments.
Yes, that's a good idea. And we may end up wanting to the same for other commonly used portals.
I'm not sure how y'all think of things at Pangeo-forge but, from a science user's perspective, there's a lot to be gained by more targeted processing.
Last comment for now but wanted to add this because I realized I did not answer the aesthetic dimension of this question. The aim of Pangeo Forge is to produced analysis-ready, cloud-optimized (ARCO) datasets. The XarrayZarrRecipe
will take care of the cloud-optimized part, but as the domain expert, we defer to you for the analysis-ready part. You should absolutely apply whatever preprocessing will make this data a dream to work with, and which will help you and other scientists minimize, or even eliminate, the latency between opening this dataset and getting started on your/their science. Our ideal world is one in which you open this dataset and breathe a sign of relief, "Ah, what a relief, this dataset is ready to go!"
@cisaacstern I've cloned this repo and started work on my recipe, building on your generous help. A couple questions arising:
-
As you note the signature for
process_inputs
isprocess_input(ds: xr.Dataset, filename: str) -> ds: xr.Dataset
. My understanding is thatds
is the results ofds = xr.open_dataset(filename, **client_kwargs)
. Is that correct? If so, I guess it's ok to make other calls toxr.open_dataset()
with different arguments within the body ofprocess_inputs()
? -
What is the preferred way at present to loop over a collection of recipes, as you do here, in the current environment?
-
Related: is it ok to have a recipe repo contain several recipes?
In general I would not recommend calling open_dataset
from within the preprocessing function. Although I can see how that hack would be a useful hack for us to get around the fact that we cannot distinguish between different groups at the FilePattern
level. So perhaps we do it for now and then refactor later once https://github.com/pangeo-forge/pangeo-forge-recipes/pull/245 is done.
- is it ok to have a recipe repo contain several recipes?
Yes. They just have to be enumerate in meta.yaml.
Thanks for your patience with this Rob. It's very helpful for us to have willing guinea pigs. 🐹
What is the preferred way at present to loop over a collection of recipes, as you do https://github.com/pangeo-forge/staged-recipes/issues/125#issuecomment-1077053600, in the current environment?
Everything in that linked comment should work as-is with the current release of pangeo-forge-recipes
.
As I show there, generally I've found the most concise way to define a number of recipes with some overlapping kwargs and some unique kwargs is with a dictionary comprehension. But you can also just write them out, "long-hand", one at a time, which is more verbose but has the benefit of being more easily (human) readable.
For test execution of a collection of recipes, the code in that same linked comment should also work as-is, but certainly let me know if you find otherwise.
@cisaacstern I'm coming back to this project and now have a condo environment that includes the pangeo-forge package. I'm a little unclear how the pieces of code are supposed to fit together. Looking at the other examples in this repo, it seems that recipe.py
defines a single recipe that will eventually be executed with recipe.to_function()()
. Your comment above goes beyond this, to define a dictionary per_group_recipes
with each item being a recipe. You then execute in a loop over the dictionary elements. How would I arrange e.g. recipe.py
to do this loop? I realize I could create a separate Python recipe file for each group but that seems like the long way round.