staged-recipes icon indicating copy to clipboard operation
staged-recipes copied to clipboard

Proposed Recipes for NASA MODIS-COSP data (satellite observations of clouds)

Open RobertPincus opened this issue 2 years ago • 48 comments

Source Dataset

This data provides satellite observations of targeted at the evaluation of global models, facilitated with the use of synthetic observations ("satellite simulators"). The data are a re-packaging of standard "Level-3" (gridded, aggregated) cloud products produced by NASA's MODIS satellites; data from both instruments (morning/Terra and afternoon/Aqua) are combined. The fields conform to output from the "MODIS simulator," one of several used in the CFMIP Observation Simulator Package (COSP, paper1, paper2). Output from COSP and the MODIS simulator is requested as part of the Cloud Feedbacks Model Intercomparison Project (CFMIP), part of CMIP.

  • Data are described at https://ladsweb.modaps.eosdis.nasa.gov/missions-and-measurements/products/MCD06COSP_M3_MODIS.
  • The source files are netCDF4, one per month (a daily product is also available). There are 23 groups, each corresponding to a variable. Each group contains a range of statistical quantities. Some groups contain joint histogram with secondary parameters.
  • The files are available directly, through OpenDAP (example file) or for staging for download through a GUI.
  • The data record begins in 2002 and is updated monthly as new data is acquired.

Transformation / Alignment / Merging

Ideally we would provide several related datasets. One would contain the mean values for many or even all scalar fields. This means extracting the mean value from each group from each file and concatenating the fields in time. A second would be the joint histograms, which need to be extracted and normalized, metadata refined, and also concatenated in time. Since the joint histograms are roughly 50 times as large as each scalar field it may be best to create one dataset per joint histogram (there are about a dozen).

Output Dataset

Given the user community it would be useful to produce netCDF output, perhaps with a kerchunk index. It would be fine to also produce a Zarr or related dataset, perhaps in addition. Whatever the format, the data should be structured so that it's easy to append new data as it is produced.

RobertPincus avatar Mar 18 '22 15:03 RobertPincus

Thanks for this proposal, @RobertPincus.

If I understand correctly, there are at least two related (but distinct) goals outlined here:

  1. Creating a mirror of the complete dataset(s) in some cloud optimized format
  2. Producing various reductions (means, joint histograms) from the complete dataset(s)

Generally, Pangeo Forge is focused on being very good at goal 1: producing optimized mirrors of archival datasets (in complete form). Once this is accomplished, goal 2 becomes much easier, as the data will be staged in manner conducive to scalable parallel computation.

In terms of goal 1 (producing the cloud optimized mirror), I note that programmatic distribution of this data is available via OPeNDAP. This would suggest to me that producing a Zarr dataset may be our best option. Kerchunk would be a more efficient option if the data were already staged as netCDFs. Given that is not the case, producing a kerchunk index would entail storing the dataset as netCDFs on the cloud, and then producing the kerchunk indexes. A less efficient (two step) process as compared to directly creating a Zarr store from the OPeNDAP endpoint.

If starting with creation of a Zarr copy of this dataset is an acceptable starting place, we can use XarrayZarrRecipe to accomplish this. This recipe class supports OPeNDAP inputs.

Is working on this recipe (a few dozen lines of Python code) something you or someone in your group is interested in? If so I can point to the relevant documentation for getting started. If not, we can open this up to others (myself included, perhaps) to collaborate on this development, though note that this latter option may take a bit longer to get spun up.

Looking forward to bringing this vision to life!

cisaacstern avatar Mar 18 '22 16:03 cisaacstern

@cisaacstern, thanks very much for this feedback.

As a point of clarification, the data already contains the means, joint histograms, etc. that we want - they are just accessed via netCDF groups.

One wrinkle in the ointment is that the file names contain the date of production. Since we don't know this date a priori it amounts to a quasi-random string. Do you know if there's a way in OpenDAP to specify opening files that match a certain pattern including a wildcard?

I'm open to outputting Zarr; if people want to recycle the recipe to make local netCDF mirrors that'll be easy enough. I don't yet understand if, say, one Zarr object is roughly equivalent to a netCDF file, or if a single object could include many variables.

For a first try you can certainly point me to documentation and I can see how far I can get.

Thanks a lot.

RobertPincus avatar Mar 19 '22 02:03 RobertPincus

One wrinkle in the ointment is that the file names contain the date of production. Since we don't know this date a priori it amounts to a quasi-random string.

This is a really annoying feature of many datasets. Do we know if the hyrax server exposes a TDS catalog or any other catalog? If so, we could crawl it to populate the FilePattern.

rabernat avatar Mar 21 '22 12:03 rabernat

I'll see if I can find out about a TDS catalog. JSON files are provided, at least (top level).

RobertPincus avatar Mar 21 '22 12:03 RobertPincus

@RobertPincus thanks for the clarification. Here is the documentation on recipe contribution. (This published just this morning, so if anything doesn't make sense, that's my fault! Please let me know if so and I will amend.)

Re: your question about what a Zarr store can represent, a single Zarr store can include as many variables as we want, so long as they exist on the same time dimension.

As you'll see in the linked docs, you'll want to define a Recipe Object (in this case, an XarrayZarrRecipe), which requires a FilePattern as input. The FilePattern itself requires a url format function as input, which is a Python function that can create a valid url path to the source data based on, e.g., a date input.

I've worked out a start for this format function based on the (very helpful!) JSON catalog link you provided:

import pandas as pd
import requests

BASE_URL = "http://ladsweb.modaps.eosdis.nasa.gov"
DATASET_ID = "61/MCD06COSP_M3_MODIS"

dates = pd.date_range("2002-07-01", "2021-07-01", freq="MS")  # "MS" for "month start"

def make_url(date):
    """Make an OPeNDAP url for NASA MODIS-COSP data based on an input date.
    
    :param date: A member of the ``pandas.core.indexes.datetimes.DatetimeIndex``
        created with ``dates = pd.date_range("2002-07-01", "2021-07-01", freq="MS")``.
    """
    day_of_year = date.timetuple().tm_yday
    response = requests.get(
        f"{BASE_URL}/archive/allData/{DATASET_ID}/{date.year}/{day_of_year}.json"
    )
    filename = [r["name"] for r in response.json()].pop(0)
    
    return f"{BASE_URL}/opendap/hyrax/allData/{DATASET_ID}/{date.year}/{day_of_year}/{filename}"

This function faithfully reproduces the example url you provided in your first comment on this thread:

url = make_url(dates[0])
print(url)
http://ladsweb.modaps.eosdis.nasa.gov/opendap/hyrax/allData/61/MCD06COSP_M3_MODIS/2002/182/MCD06COSP_M3_MODIS.A2002182.061.2020181145824.nc

However I get an error when trying to open this URL with xarray:

import xarray as xr
ds = xr.open_dataset(url)
syntax error, unexpected WORD_WORD, expecting ';' or ','
context: Attributes { latitude { Float64 _FillValue -999.0000000000000; String units "degrees_north"; } longitude { Float64 _FillValue -999.0000000000000; String units "degrees_east"; } NC_GLOBAL { String YAML_config "grid_settings: gridsize: 1 projection: conformal lat_in: Latitude lon_in: Longitude lat_out: Latitude lon_out: Longitude fill_value: -999variable_settings: - name_in: Solar_Zenith name_out: Solar_Zenith attributes:  - name: long_name value: Solar Zenith Angle (Cell to Sun) for Daytime Scenes - name: units value: degrees - name: _FillValue value: -999.0 - name: valid_min value: 0.0 - name: valid_max value: 180.0 - name: scale_factor value: 1.0 - name: add_offset value: 0.0 masks: - Mask_Day - name_in: Solar_Azimuth name_out: Solar_Azimuth attributes:  - name: long_name value: Solar Azimuth Angle (Cell to Sun) for Daytime Scenes - name: units value: degrees - name: _FillValue value: -999.0 - name: valid_min value: -180.0 - name: valid_max value: 180.0 - name: scale_factor value: 1.0 - name: add_offset value: 0.0 masks: - Mask_Day - name_in: Sensor_Zenith name_out: Sensor_Zenith attributes:  - name: long_name value: Sensor Zenith Angle (Cell to Sensor) for Daytime Scenes - name: units value: degrees - name: _FillValue value: -999.0 - name: valid_min value: 0.0 - name: valid_max value: 180.0 - name: scale_factor value: 1.0 - name: add_offset value: 0.0 masks: - Mask_Day - name_in: Sensor_Azimuth name_out: Sensor_Azimuth attributes:  - name: long_name value: Sensor Azimuth Angle (Cell to Sensor) for Daytime Scenes - name: units value: degrees - name: _FillValue value: -999.0 - name: valid_min value: -180.0 - name: valid_max value: 180.0 - name: scale_factor value: 1.0 - name: add_offset value: 0.0 masks: - Mask_Day - name_in: Cloud_Top_Pressure name_out: Cloud_Top_Pressure attributes:  - name: long_name value: Cloud Top Pressure for Daytime Scenes - name: units value: mb - name: _FillValue value: -999.0 - name: valid_min value: 1.0 - name: valid_max value: 1100.0 - name: scale_factor value: 1.0 - name: add_offset value: 0.0 masks: - Mask_Day - name_in: Cloud_Fraction name_out: Cloud_Mask_Fraction attributes:  - name: long_name value: Cloud Fraction from Cloud Mask for Daytime Scenes - name: units value: none - name: _FillValue value: -999.0 - name: valid_min value: 0.0 - name: valid_max value: 1.0 - name: scale_factor value: 1.0 - name: add_offset value: 0.0 masks: - Mask_Day - name_in: Cloud_Fraction name_out: Cloud_Mask_Fraction_Low attributes:  - name: long_name value: Cloud Fraction from Cloud Mask (Low, CTP GE 680 hPa) for Daytime Scenes - name: units value: none - name: _FillValue value: -999.0 - name: valid_min value: 0.0 - name: valid_max value: 1.0 - name: scale_factor value: 1.0 - name: add_offset value: 0.0 masks: - Mask_Day - Mask_Low - name_in: Cloud_Fraction name_out: Cloud_Mask_Fraction_Mid attributes:  - name: long_name value: Cloud Fraction from Cloud Mask (Mid, 680 hPa GT CTP GE 440 hPa) for Daytime Scenes - name: units value: none - name: _FillValue value: -999.0 - name: valid_min value: 0.0 - name: valid_max value: 1.0 - name: scale_factor value: 1.0 - name: add_offset value: 0.0 masks: - Mask_Day - Mask_Middle - name_in: Cloud_Fraction name_out: Cloud_Mask_Fraction_High attributes:  - name: long_name value: Cloud Fraction from Cloud Mask (High, CTP LT 440 hPa) for Daytime Scenes - name: units value: none - name: _FillValue value: -999.0 - name: valid_min value: 0.0 - name: valid_max value: 1.0 - name: scale_factor value: 1.0 - name: add_offset value: 0.0 masks: - Mask_Day - Mask_High - name_in: Cloud_Optical_Thickness name_out: Cloud_Optical_Thickness_Liquid attributes:  - name: long_name value: Cloud Optical Thickness for Liquid Water Clouds (3.7 micron Retrieval for Cloudy Scenes) - name: units value: none - name: _FillValue value: -999.0 - name: valid_min value: 0.0 - name: valid_max value: 150.0 - name: scale_factor value: 1.0 - name: add_offset value: 0.0 2D_histograms: - name_out: JHisto_vs_Cloud_Particle_Size_Liquid primary_var: edges: [0.0, 0.3, 1.3, 3.6, 9.4, 23.0, 60.0, 150.0] joint_var: name_in: Cloud_Effective_Radius edges: [4.0, 8.0, 10.0, 13.0, 15.0, 20.0, 30.0] masks: - Mask_Valid_Range_CER - Mask_Liquid_Water_Phase_Clouds - name_in: Cloud_Optical_Thickness name_out: Cloud_Optical_Thickness_Ice attributes:  - name: long_name value: Cloud Optical Thickness for Ice Clouds (3.7 micron Retrieval for Cloudy Scenes) - name: units value: none - name: _FillValue value: -999.0 - name: valid_min value: 0.0 - name: valid_max value: 150.0 - name: scale_factor value: 1.0 - name: add_offset value: 0.0 2D_histograms: - name_out: JHisto_vs_Cloud_Particle_Size_Ice primary_var: edges: [0.0, 0.3, 1.3, 3.6, 9.4, 23.0, 60.0, 150.0] joint_var: name_in: Cloud_Effective_Radius edges: [5.0, 10.0, 20.0, 30.0, 40.0, 50.0, 60.0] masks: - Mask_Ice_Phase_Clouds  - name_in: Cloud_Optical_Thickness name_out: Cloud_Optical_Thickness_Total attributes:  - name: long_name value: Cloud Optical Thickness for Combined (LiquidWater+Ice+Undetermined) Phase Clouds (3.7 micron Retrieval for Cloudy Scenes) - name: units value: none - name: _FillValue value: -999.0 - name: valid_min value: 0.0 - name: valid_max value: 150.0 - name: scale_factor value: 1.0 - name: add_offset value: 0.0 2D_histograms: - name_out: JHisto_vs_Cloud_Top_Pressure primary_var: edges: [0.0, 0.3, 1.3, 3.6, 9.4, 23.0, 60.0, 150.0] joint_var: name_in: Cloud_Top_Pressure edges: [0.0, 180.0, 310.0, 440.0, 560.0, 680.0, 800.0, 10000.0] masks: - Mask_Valid_Range_CER - Mask_Combined_Phase_Clouds - name_in: Cloud_Optical_Thickness_PCL name_out: Cloud_Optical_Thickness_PCL_Total only_histograms: attributes:  - name: long_name value: Cloud Optical Thickness for Combined (LiquidWater+Ice+Undetermined) Phase Clouds (3.7 micron Retrieval for Partly Cloudy (PCL) Scenes) - name: units value: none - name: _FillValue value: -999.0 - name: valid_min value: 0.0 - name: valid_max value: 150.0 - name: scale_factor value: 1.0 - name: add_offset value: 0.0 2D_histograms: - name_out: JHisto_vs_Cloud_Top_Pressure primary_var: edges: [0.0, 0.3, 1.3, 3.6, 9.4, 23.0, 60.0, 150.0] joint_var: name_in: Cloud_Top_Pressure edges: [0.0, 180.0, 310.0, 440.0, 560.0, 680.0, 800.0, 10000.0] masks: - Mask_Valid_Range_CERPCL - Mask_Combined_Phase_Clouds - name_in: Cloud_Optical_Thickness_Log name_out: Cloud_Optical_Thickness_Log10_Liquid attributes:  - name: long_name value: Cloud Optical Thickness Log10 for Liquid Water Clouds (3.7 micron Retrieval for Cloudy Scenes) - name: units value: none - name: _FillValue value: -999.0 - name: valid_min value: -2.0 - name: valid_max value: 2.176 - name: scale_factor value: 1.0 - name: add_offset value: 0.0 masks: - Mask_Valid_Range_CER - Mask_Liquid_Water_Phase_Clouds - name_in: Cloud_Optical_Thickness_Log name_out: Cloud_Optical_Thickness_Log10_Ice attributes:  - name: long_name value: Cloud Optical Thickness Log10 for Ice Clouds (3.7 micron Retrieval for Cloudy Scenes) - name: units value: none - name: _FillValue value: -999.0 - name: valid_min value: -2.0 - name: valid_max value: 2.176 - name: scale_factor value: 1.0 - name: add_offset value: 0.0 masks: - Mask_Ice_Phase_Clouds  - name_in: Cloud_Optical_Thickness_Log name_out: Cloud_Optical_Thickness_Log10_Total attributes:  - name: long_name value: Cloud Optical Thickness Log10 for Combined (LiquidWater+Ice+Undetermined) Phase Clouds (3.7 micron Retrieval for Cloudy Scenes) - name: units value: none - name: _FillValue value: -999.0 - name: valid_min value: -2.0 - name: valid_max value: 2.176 - name: scale_factor value: 1.0 - name: add_offset value: 0.0 masks: - Mask_Valid_Range_CER - Mask_Combined_Phase_Clouds - name_in: Cloud_Effective_Radius name_out: Cloud_Particle_Size_Liquid attributes:  - name: long_name value: Cloud Effective Radius for Liquid Water Clouds (3.7 micron Retrieval for Cloudy Scenes) - name: units value: microns - name: _FillValue value: -999.0 - name: valid_min value: 4.0 - name: valid_max value: 30.0 - name: scale_factor value: 1.0 - name: add_offset value: 0.0 masks: - Mask_Valid_Range_CER - Mask_Liquid_Water_Phase_Clouds - name_in: Cloud_Effective_Radius name_out: Cloud_Particle_Size_Ice attributes:  - name: long_name value: Cloud Effective Radius for Ice Clouds (3.7 micron Retrieval for Cloudy Scenes) - name: units value: microns - name: _FillValue value: -999.0 - name: valid_min value: 5.0 - name: valid_max value: 60.0 - name: scale_factor value: 1.0 - name: add_offset value: 0.0 masks: - Mask_Ice_Phase_Clouds - name_in: Cloud_Water_Path name_out: Cloud_Water_Path_Liquid attributes:  - name: long_name value: Cloud Water Path for Liquid Water Clouds (3.7 micron Retrieval for Cloudy Scenes) - name: units value: g/m^2  - name: _FillValue value: -999.0 - name: valid_min value: 0.0 - name: valid_max value: 3000.0 - name: scale_factor value: 1.0 - name: add_offset value: 0.0 masks: - Mask_Valid_Range_CER - Mask_Liquid_Water_Phase_Clouds - name_in: Cloud_Water_Path name_out: Cloud_Water_Path_Ice attributes:  - name: long_name value: Cloud Water Path for Ice Clouds (3.7 micron Retrieval for Cloudy Scenes) - name: units value: g/m^2 - name: _FillValue value: -999.0 - name: valid_min value: 0.0 - name: valid_max value: 6000.0 - name: scale_factor value: 1.0 - name: add_offset value: 0.0 masks: - Mask_Ice_Phase_Clouds - name_in: COPR_Liquid name_out: Cloud_Retrieval_Fraction_Liquid attributes:  - name: long_name value: Cloud Optical Properties Retrieval Fraction (Liquid Water Clouds) - name: units value: none - name: _FillValue value: -999.0 - name: valid_min value: 0.0 - name: valid_max value: 1.0 - name: scale_factor value: 1.0 - name: add_offset value: 0.0 - name_in: COPR_Ice name_out: Cloud_Retrieval_Fraction_Ice attributes:  - name: long_name value: Cloud Optical Properties Retrieval Fraction (Ice Clouds) - name: units value: none - name: _FillValue value: -999.0 - name: valid_min value: 0.0 - name: valid_max value: 1.0 - name: scale_factor value: 1.0 - name: add_offset value: 0.0 - name_in: COPR_Combined name_out: Cloud_Retrieval_Fraction_Total attributes:  - name: long_name value: Cloud Optical Properties Retrieval Fraction (Combined (LiquidWater+Ice+Undetermined) Phase Clouds) - name: units value: none - name: _FillValue value: -999.0 - name: valid_min value: 0.0 - name: valid_max value: 1.0 - name: scale_factor value: 1.0 - name: add_offset value: 0.0"; String Yori_version "1.3.16"; String daily_defn_of_day_adjustment "False"; String input_files "MCD06COSP_D3_MODIS.A2002185.061.2020179074148.nc,MCD06COSP_D3_MODIS.A2002186.061.2020179074020.nc,MCD06COSP_D3_MODIS.A2002187.061.2020179080105.nc,MCD06COSP_D3_MODIS.A2002188.061.2020179073800.nc,MCD06COSP_D3_MODIS.A2002189.061.2020179075527.nc,MCD06COSP_D3_MODIS.A2002190.061.2020181140712.nc,MCD06COSP_D3_MODIS.A2002191.061.2020179073354.nc,MCD06COSP_D3_MODIS.A2002192.061.2020181140657.nc,MCD06COSP_D3_MODIS.A2002193.061.2020181140639.nc,MCD06COSP_D3_MODIS.A2002194.061.2020181140633.nc,MCD06COSP_D3_MODIS.A2002195.061.2020179073600.nc,MCD06COSP_D3_MODIS.A2002196.061.2020179071759.nc,MCD06COSP_D3_MODIS.A2002197.061.2020179073136.nc,MCD06COSP_D3_MODIS.A2002198.061.2020181140638.nc,MCD06COSP_D3_MODIS.A2002199.061.2020179073626.nc,MCD06COSP_D3_MODIS.A2002200.061.2020181140632.nc,MCD06COSP_D3_MODIS.A2002201.061.2020181140623.nc,MCD06COSP_D3_MODIS.A2002202.061.2020179073345.nc,MCD06COSP_D3_MODIS.A2002203.061.2020179072223.nc,MCD06COSP_D3_MODIS.A2002204.061.2020179072036.nc,MCD06COSP_D3_MODIS.A2002205.061.2020179074935.nc,MCD06COSP_D3_MODIS.A2002206.061.2020179072758.nc,MCD06COSP_D3_MODIS.A2002207.061.2020179074751.nc,MCD06COSP_D3_MODIS.A2002208.061.2020179074110.nc,MCD06COSP_D3_MODIS.A2002209.061.2020179073958.nc,MCD06COSP_D3_MODIS.A2002210.061.2020181140608.nc,MCD06COSP_D3_MODIS.A2002211.061.2020181140441.nc,MCD06COSP_D3_MODIS.A2002212.061.2020181140457.nc"; String history ""; String source "idl 8.4, mcd06cosp_preyori 20191204-1, yori 1.3.16"; String date_created "2020-06-29T14:58:03Z"; String product_name "MCD06COSP_M3_MODIS.A2002182.061.2020181145824.nc"; String LocalGranuleID "MCD06COSP_M3_MODIS.A2002182.061.2020181145824.nc"; String Conventions "CF-1.6, ACDD-1.3"; String ShortName "MCD06COSP_M3_MODIS"; String product_version "6.1.2"; String AlgorithmType "OPS"; String identifier_product_doi "10.5067/MODIS/MCD06COSP_M3_MODIS.061"; String identifier_product_doi_authority "http://dx.doi.org/"; String ancillary_files ""; String DataCenterId "UWI-MAD/SSEC/ASIPS"; String project "NASA VIIRS Atmosphere SIPS"; String creator_name "NASA VIIRS Atmosphere SIPS"; String creator_url "https://sips.ssec.wisc.edu/"; String creator_email "[email protected]"; String creator_institution "Space Science & Engineering Center, University of Wisconsin - Madison"; String publisher_name "LAADS"; String publisher_url "https://ladsweb.modaps.eosdis.nasa.gov/"; String publisher_email "[email protected]"; String publisher_institution "NASA Level-1 and Atmosphere Archive & Distribution System"; String time_coverage_start "2002-07-01T00:00:00.000000"; String time_coverage_end "2002-07-31T23:59:59.000000"; String xmlmetadata "<?xml version="1.0"^?><!DOCTYPE GranuleMetaDataFile SYSTEM "http://ecsinfo.gsfc.nasa.gov/ECSInfo/ecsmetadata/dtds/DPL/ECS/ScienceGranuleMetadata.dtd"><GranuleMetaDataFile> <DTDVersion>1.0</DTDVersion> <DataCenterId>UWI-MAD/SSEC/ASIPS</DataCenterId> <GranuleURMetaData> <CollectionMetaData> <ShortName>MCD06COSP_M3_MODIS</ShortName> <VersionID>61</VersionID> </CollectionMetaData> <ECSDataGranule> <ReprocessingPlanned>no further reprocessing anticipated</ReprocessingPlanned> <LocalGranuleID>MCD06COSP_M3_MODIS.A2002182.061.2020181145824.nc</LocalGranuleID> <ProductionDateTime>2020-06-29 14:58:49.491586</ProductionDateTime> <LocalVersionID>61</LocalVersionID> </ECSDataGranule> <PGEVersionClass> <PGEVersion>6.1.2</PGEVersion> </PGEVersionClass> <RangeDateTime> <RangeEndingTime>23:59:59.000000</RangeEndingTime> <RangeEndingDate>2002-07-31</RangeEndingDate> <RangeBeginningTime>00:00:00.000000</RangeBeginningTime> <RangeBeginningDate>2002-07-01</RangeBeginningDate> </RangeDateTime> <SpatialDomainContainer> <HorizontalSpatialDomainContainer> <BoundingRectangle> <WestBoundingCoordinate>-180</WestBoundingCoordinate> <NorthBoundingCoordinate>90</NorthBoundingCoordinate> <EastBoundingCoordinate>180</EastBoundingCoordinate> <SouthBoundingCoordinate>-90</SouthBoundingCoordinate> </BoundingRectangle> </HorizontalSpatialDomainContainer> </SpatialDomainContainer> <Platform> <PlatformShortName>Suomi NPP</PlatformShortName> <Instrument> <InstrumentShortName>VIIRS</InstrumentShortName> <Sensor> <SensorShortName>VIIRS</SensorShortName> </Sensor> </Instrument> </Platform> <InputGranule> <InputPointer>MCD06COSP_D3_MODIS.A2002185.061.2020179074148.nc</InputPointer> <InputPointer>MCD06COSP_D3_MODIS.A2002186.061.2020179074020.nc</InputPointer> <InputPointer>MCD06COSP_D3_MODIS.A2002187.061.2020179080105.nc</InputPointer> <InputPointer>MCD06COSP_D3_MODIS.A2002188.061.2020179073800.nc</InputPointer> <InputPointer>MCD06COSP_D3_MODIS.A2002189.061.2020179075527.nc</InputPointer> <InputPointer>MCD06COSP_D3_MODIS.A2002190.061.2020181140712.nc</InputPointer> <InputPointer>MCD06COSP_D3_MODIS.A2002191.061.2020179073354.nc</InputPointer> <InputPointer>MCD06COSP_D3_MODIS.A2002192.061.2020181140657.nc</InputPointer> <InputPointer>MCD06COSP_D3_MODIS.A2002193.061.2020181140639.nc</InputPointer> <InputPointer>MCD06COSP_D3_MODIS.A2002194.061.2020181140633.nc</InputPointer> <InputPointer>MCD06COSP_D3_MODIS.A2002195.061.2020179073600.nc</InputPointer> <InputPointer>MCD06COSP_D3_MODIS.A2002196.061.2020179071759.nc</InputPointer> <InputPointer>MCD06COSP_D3_MODIS.A2002197.061.2020179073136.nc</InputPointer> <InputPointer>MCD06COSP_D3_MODIS.A2002198.061.2020181140638.nc</InputPointer> <InputPointer>MCD06COSP_D3_MODIS.A2002199.061.2020179073626.nc</InputPointer> <InputPointer>MCD06COSP_D3_MODIS.A2002200.061.2020181140632.nc</InputPointer> <InputPointer>MCD06COSP_D3_MODIS.A2002201.061.2020181140623.nc</InputPointer> <InputPointer>MCD06COSP_D3_MODIS.A2002202.061.2020179073345.nc</InputPointer> <InputPointer>MCD06COSP_D3_MODIS.A2002203.061.2020179072223.nc</InputPointer> <InputPointer>MCD06COSP_D3_MODIS.A2002204.061.2020179072036.nc</InputPointer> <InputPointer>MCD06COSP_D3_MODIS.A2002205.061.2020179074935.nc</InputPointer> <InputPointer>MCD06COSP_D3_MODIS.A2002206.061.2020179072758.nc</InputPointer> <InputPointer>MCD06COSP_D3_MODIS.A2002207.061.2020179074751.nc</InputPointer> <InputPointer>MCD06COSP_D3_MODIS.A2002208.061.2020179074110.nc</InputPointer> <InputPointer>MCD06COSP_D3_MODIS.A2002209.061.2020179073958.nc</InputPointer> <InputPointer>MCD06COSP_D3_MODIS.A2002210.061.2020181140608.nc</InputPointer> <InputPointer>MCD06COSP_D3_MODIS.A2002211.061.2020181140441.nc</InputPointer> <InputPointer>MCD06COSP_D3_MODIS.A2002212.061.2020181140457.nc</InputPointer> </InputGranule> <AncillaryInputGranules> </AncillaryInputGranules> </GranuleURMetaData></GranuleMetaDataFile>"; String platform "Aqua, Terra"; String instrument "MODIS"; String processing_level "L3"; String format "NetCDF4"; String title "Aqua/Terra MODIS Cloud Properties Level 3 monthly, 1x1 degree grid (MCD06COSP_M3_MODIS)"; String long_name "MODIS (Aqua/Terra) Cloud Properties Level 3 monthly, 1x1 degree grid"; String version_id "061"; Float64 geospatial_lat_max 90.00000000000000; Float64 geospatial_lat_min -90.00000000000000; Float64 geospatial_lon_min 180.0000000000000; Float64 geospatial_lon_max -180.0000000000000; Float64 NorthBoundingCoordinate 90.00000000000000; Float64 SouthBoundingCoordinate -90.00000000000000; Float64 EastBoundingCoordinate 180.0000000000000; Float64 WestBoundingCoordinate -180.0000000000000; Float64 latitude_resolution 1.000000000000000; Float64 longitude_resolution 1.000000000000000; String license "http://science.nasa.gov/earth-science/earth-science-data/data-information-policy/"; String stdname_vocabulary "NetCDF Climate and Forecast (CF) Metadata Convention"; String keywords_vocabulary "NASA Global Change Master Directory (GCMD) Science Keywords"; String keywords "EARTH SCIENCE > ATMOSPHERE > CLOUDS > CLOUD MICROPHYSICS > CLOUD OPTICAL DEPTH/THICKNESS, EARTH SCIENCE > ATMOSPHERE > CLOUDS > CLOUD PROPERTIES > CLOUD TOP HEIGHT, EARTH SCIENCE > ATMOSPHERE > CLOUDS > CLOUD PROPERTIES > CLOUD FRACTION"; String naming_authority "gov.nasa.gsfc.sci.atmos"; }}
Illegal attribute
context: Attributes { latitude { Float64 _FillValue -999.0000000000000; String units "degrees_north"; } longitude { Float64 _FillValue -999.0000000000000; String units "degrees_east"; } NC_GLOBAL { String YAML_config "grid_settings: gridsize: 1 projection: conformal lat_in: Latitude lon_in: Longitude lat_out: Latitude lon_out: Longitude fill_value: -999variable_settings: - name_in: Solar_Zenith name_out: Solar_Zenith attributes:  - name: long_name value: Solar Zenith Angle (Cell to Sun) for Daytime Scenes - name: units value: degrees - name: _FillValue value: -999.0 - name: valid_min value: 0.0 - name: valid_max value: 180.0 - name: scale_factor value: 1.0 - name: add_offset value: 0.0 masks: - Mask_Day - name_in: Solar_Azimuth name_out: Solar_Azimuth attributes:  - name: long_name value: Solar Azimuth Angle (Cell to Sun) for Daytime Scenes - name: units value: degrees - name: _FillValue value: -999.0 - name: valid_min value: -180.0 - name: valid_max value: 180.0 - name: scale_factor value: 1.0 - name: add_offset value: 0.0 masks: - Mask_Day - name_in: Sensor_Zenith name_out: Sensor_Zenith attributes:  - name: long_name value: Sensor Zenith Angle (Cell to Sensor) for Daytime Scenes - name: units value: degrees - name: _FillValue value: -999.0 - name: valid_min value: 0.0 - name: valid_max value: 180.0 - name: scale_factor value: 1.0 - name: add_offset value: 0.0 masks: - Mask_Day - name_in: Sensor_Azimuth name_out: Sensor_Azimuth attributes:  - name: long_name value: Sensor Azimuth Angle (Cell to Sensor) for Daytime Scenes - name: units value: degrees - name: _FillValue value: -999.0 - name: valid_min value: -180.0 - name: valid_max value: 180.0 - name: scale_factor value: 1.0 - name: add_offset value: 0.0 masks: - Mask_Day - name_in: Cloud_Top_Pressure name_out: Cloud_Top_Pressure attributes:  - name: long_name value: Cloud Top Pressure for Daytime Scenes - name: units value: mb - name: _FillValue value: -999.0 - name: valid_min value: 1.0 - name: valid_max value: 1100.0 - name: scale_factor value: 1.0 - name: add_offset value: 0.0 masks: - Mask_Day - name_in: Cloud_Fraction name_out: Cloud_Mask_Fraction attributes:  - name: long_name value: Cloud Fraction from Cloud Mask for Daytime Scenes - name: units value: none - name: _FillValue value: -999.0 - name: valid_min value: 0.0 - name: valid_max value: 1.0 - name: scale_factor value: 1.0 - name: add_offset value: 0.0 masks: - Mask_Day - name_in: Cloud_Fraction name_out: Cloud_Mask_Fraction_Low attributes:  - name: long_name value: Cloud Fraction from Cloud Mask (Low, CTP GE 680 hPa) for Daytime Scenes - name: units value: none - name: _FillValue value: -999.0 - name: valid_min value: 0.0 - name: valid_max value: 1.0 - name: scale_factor value: 1.0 - name: add_offset value: 0.0 masks: - Mask_Day - Mask_Low - name_in: Cloud_Fraction name_out: Cloud_Mask_Fraction_Mid attributes:  - name: long_name value: Cloud Fraction from Cloud Mask (Mid, 680 hPa GT CTP GE 440 hPa) for Daytime Scenes - name: units value: none - name: _FillValue value: -999.0 - name: valid_min value: 0.0 - name: valid_max value: 1.0 - name: scale_factor value: 1.0 - name: add_offset value: 0.0 masks: - Mask_Day - Mask_Middle - name_in: Cloud_Fraction name_out: Cloud_Mask_Fraction_High attributes:  - name: long_name value: Cloud Fraction from Cloud Mask (High, CTP LT 440 hPa) for Daytime Scenes - name: units value: none - name: _FillValue value: -999.0 - name: valid_min value: 0.0 - name: valid_max value: 1.0 - name: scale_factor value: 1.0 - name: add_offset value: 0.0 masks: - Mask_Day - Mask_High - name_in: Cloud_Optical_Thickness name_out: Cloud_Optical_Thickness_Liquid attributes:  - name: long_name value: Cloud Optical Thickness for Liquid Water Clouds (3.7 micron Retrieval for Cloudy Scenes) - name: units value: none - name: _FillValue value: -999.0 - name: valid_min value: 0.0 - name: valid_max value: 150.0 - name: scale_factor value: 1.0 - name: add_offset value: 0.0 2D_histograms: - name_out: JHisto_vs_Cloud_Particle_Size_Liquid primary_var: edges: [0.0, 0.3, 1.3, 3.6, 9.4, 23.0, 60.0, 150.0] joint_var: name_in: Cloud_Effective_Radius edges: [4.0, 8.0, 10.0, 13.0, 15.0, 20.0, 30.0] masks: - Mask_Valid_Range_CER - Mask_Liquid_Water_Phase_Clouds - name_in: Cloud_Optical_Thickness name_out: Cloud_Optical_Thickness_Ice attributes:  - name: long_name value: Cloud Optical Thickness for Ice Clouds (3.7 micron Retrieval for Cloudy Scenes) - name: units value: none - name: _FillValue value: -999.0 - name: valid_min value: 0.0 - name: valid_max value: 150.0 - name: scale_factor value: 1.0 - name: add_offset value: 0.0 2D_histograms: - name_out: JHisto_vs_Cloud_Particle_Size_Ice primary_var: edges: [0.0, 0.3, 1.3, 3.6, 9.4, 23.0, 60.0, 150.0] joint_var: name_in: Cloud_Effective_Radius edges: [5.0, 10.0, 20.0, 30.0, 40.0, 50.0, 60.0] masks: - Mask_Ice_Phase_Clouds  - name_in: Cloud_Optical_Thickness name_out: Cloud_Optical_Thickness_Total attributes:  - name: long_name value: Cloud Optical Thickness for Combined (LiquidWater+Ice+Undetermined) Phase Clouds (3.7 micron Retrieval for Cloudy Scenes) - name: units value: none - name: _FillValue value: -999.0 - name: valid_min value: 0.0 - name: valid_max value: 150.0 - name: scale_factor value: 1.0 - name: add_offset value: 0.0 2D_histograms: - name_out: JHisto_vs_Cloud_Top_Pressure primary_var: edges: [0.0, 0.3, 1.3, 3.6, 9.4, 23.0, 60.0, 150.0] joint_var: name_in: Cloud_Top_Pressure edges: [0.0, 180.0, 310.0, 440.0, 560.0, 680.0, 800.0, 10000.0] masks: - Mask_Valid_Range_CER - Mask_Combined_Phase_Clouds - name_in: Cloud_Optical_Thickness_PCL name_out: Cloud_Optical_Thickness_PCL_Total only_histograms: attributes:  - name: long_name value: Cloud Optical Thickness for Combined (LiquidWater+Ice+Undetermined) Phase Clouds (3.7 micron Retrieval for Partly Cloudy (PCL) Scenes) - name: units value: none - name: _FillValue value: -999.0 - name: valid_min value: 0.0 - name: valid_max value: 150.0 - name: scale_factor value: 1.0 - name: add_offset value: 0.0 2D_histograms: - name_out: JHisto_vs_Cloud_Top_Pressure primary_var: edges: [0.0, 0.3, 1.3, 3.6, 9.4, 23.0, 60.0, 150.0] joint_var: name_in: Cloud_Top_Pressure edges: [0.0, 180.0, 310.0, 440.0, 560.0, 680.0, 800.0, 10000.0] masks: - Mask_Valid_Range_CERPCL - Mask_Combined_Phase_Clouds - name_in: Cloud_Optical_Thickness_Log name_out: Cloud_Optical_Thickness_Log10_Liquid attributes:  - name: long_name value: Cloud Optical Thickness Log10 for Liquid Water Clouds (3.7 micron Retrieval for Cloudy Scenes) - name: units value: none - name: _FillValue value: -999.0 - name: valid_min value: -2.0 - name: valid_max value: 2.176 - name: scale_factor value: 1.0 - name: add_offset value: 0.0 masks: - Mask_Valid_Range_CER - Mask_Liquid_Water_Phase_Clouds - name_in: Cloud_Optical_Thickness_Log name_out: Cloud_Optical_Thickness_Log10_Ice attributes:  - name: long_name value: Cloud Optical Thickness Log10 for Ice Clouds (3.7 micron Retrieval for Cloudy Scenes) - name: units value: none - name: _FillValue value: -999.0 - name: valid_min value: -2.0 - name: valid_max value: 2.176 - name: scale_factor value: 1.0 - name: add_offset value: 0.0 masks: - Mask_Ice_Phase_Clouds  - name_in: Cloud_Optical_Thickness_Log name_out: Cloud_Optical_Thickness_Log10_Total attributes:  - name: long_name value: Cloud Optical Thickness Log10 for Combined (LiquidWater+Ice+Undetermined) Phase Clouds (3.7 micron Retrieval for Cloudy Scenes) - name: units value: none - name: _FillValue value: -999.0 - name: valid_min value: -2.0 - name: valid_max value: 2.176 - name: scale_factor value: 1.0 - name: add_offset value: 0.0 masks: - Mask_Valid_Range_CER - Mask_Combined_Phase_Clouds - name_in: Cloud_Effective_Radius name_out: Cloud_Particle_Size_Liquid attributes:  - name: long_name value: Cloud Effective Radius for Liquid Water Clouds (3.7 micron Retrieval for Cloudy Scenes) - name: units value: microns - name: _FillValue value: -999.0 - name: valid_min value: 4.0 - name: valid_max value: 30.0 - name: scale_factor value: 1.0 - name: add_offset value: 0.0 masks: - Mask_Valid_Range_CER - Mask_Liquid_Water_Phase_Clouds - name_in: Cloud_Effective_Radius name_out: Cloud_Particle_Size_Ice attributes:  - name: long_name value: Cloud Effective Radius for Ice Clouds (3.7 micron Retrieval for Cloudy Scenes) - name: units value: microns - name: _FillValue value: -999.0 - name: valid_min value: 5.0 - name: valid_max value: 60.0 - name: scale_factor value: 1.0 - name: add_offset value: 0.0 masks: - Mask_Ice_Phase_Clouds - name_in: Cloud_Water_Path name_out: Cloud_Water_Path_Liquid attributes:  - name: long_name value: Cloud Water Path for Liquid Water Clouds (3.7 micron Retrieval for Cloudy Scenes) - name: units value: g/m^2  - name: _FillValue value: -999.0 - name: valid_min value: 0.0 - name: valid_max value: 3000.0 - name: scale_factor value: 1.0 - name: add_offset value: 0.0 masks: - Mask_Valid_Range_CER - Mask_Liquid_Water_Phase_Clouds - name_in: Cloud_Water_Path name_out: Cloud_Water_Path_Ice attributes:  - name: long_name value: Cloud Water Path for Ice Clouds (3.7 micron Retrieval for Cloudy Scenes) - name: units value: g/m^2 - name: _FillValue value: -999.0 - name: valid_min value: 0.0 - name: valid_max value: 6000.0 - name: scale_factor value: 1.0 - name: add_offset value: 0.0 masks: - Mask_Ice_Phase_Clouds - name_in: COPR_Liquid name_out: Cloud_Retrieval_Fraction_Liquid attributes:  - name: long_name value: Cloud Optical Properties Retrieval Fraction (Liquid Water Clouds) - name: units value: none - name: _FillValue value: -999.0 - name: valid_min value: 0.0 - name: valid_max value: 1.0 - name: scale_factor value: 1.0 - name: add_offset value: 0.0 - name_in: COPR_Ice name_out: Cloud_Retrieval_Fraction_Ice attributes:  - name: long_name value: Cloud Optical Properties Retrieval Fraction (Ice Clouds) - name: units value: none - name: _FillValue value: -999.0 - name: valid_min value: 0.0 - name: valid_max value: 1.0 - name: scale_factor value: 1.0 - name: add_offset value: 0.0 - name_in: COPR_Combined name_out: Cloud_Retrieval_Fraction_Total attributes:  - name: long_name value: Cloud Optical Properties Retrieval Fraction (Combined (LiquidWater+Ice+Undetermined) Phase Clouds) - name: units value: none - name: _FillValue value: -999.0 - name: valid_min value: 0.0 - name: valid_max value: 1.0 - name: scale_factor value: 1.0 - name: add_offset value: 0.0"; String Yori_version "1.3.16"; String daily_defn_of_day_adjustment "False"; String input_files "MCD06COSP_D3_MODIS.A2002185.061.2020179074148.nc,MCD06COSP_D3_MODIS.A2002186.061.2020179074020.nc,MCD06COSP_D3_MODIS.A2002187.061.2020179080105.nc,MCD06COSP_D3_MODIS.A2002188.061.2020179073800.nc,MCD06COSP_D3_MODIS.A2002189.061.2020179075527.nc,MCD06COSP_D3_MODIS.A2002190.061.2020181140712.nc,MCD06COSP_D3_MODIS.A2002191.061.2020179073354.nc,MCD06COSP_D3_MODIS.A2002192.061.2020181140657.nc,MCD06COSP_D3_MODIS.A2002193.061.2020181140639.nc,MCD06COSP_D3_MODIS.A2002194.061.2020181140633.nc,MCD06COSP_D3_MODIS.A2002195.061.2020179073600.nc,MCD06COSP_D3_MODIS.A2002196.061.2020179071759.nc,MCD06COSP_D3_MODIS.A2002197.061.2020179073136.nc,MCD06COSP_D3_MODIS.A2002198.061.2020181140638.nc,MCD06COSP_D3_MODIS.A2002199.061.2020179073626.nc,MCD06COSP_D3_MODIS.A2002200.061.2020181140632.nc,MCD06COSP_D3_MODIS.A2002201.061.2020181140623.nc,MCD06COSP_D3_MODIS.A2002202.061.2020179073345.nc,MCD06COSP_D3_MODIS.A2002203.061.2020179072223.nc,MCD06COSP_D3_MODIS.A2002204.061.2020179072036.nc,MCD06COSP_D3_MODIS.A2002205.061.2020179074935.nc,MCD06COSP_D3_MODIS.A2002206.061.2020179072758.nc,MCD06COSP_D3_MODIS.A2002207.061.2020179074751.nc,MCD06COSP_D3_MODIS.A2002208.061.2020179074110.nc,MCD06COSP_D3_MODIS.A2002209.061.2020179073958.nc,MCD06COSP_D3_MODIS.A2002210.061.2020181140608.nc,MCD06COSP_D3_MODIS.A2002211.061.2020181140441.nc,MCD06COSP_D3_MODIS.A2002212.061.2020181140457.nc"; String history ""; String source "idl 8.4, mcd06cosp_preyori 20191204-1, yori 1.3.16"; String date_created "2020-06-29T14:58:03Z"; String product_name "MCD06COSP_M3_MODIS.A2002182.061.2020181145824.nc"; String LocalGranuleID "MCD06COSP_M3_MODIS.A2002182.061.2020181145824.nc"; String Conventions "CF-1.6, ACDD-1.3"; String ShortName "MCD06COSP_M3_MODIS"; String product_version "6.1.2"; String AlgorithmType "OPS"; String identifier_product_doi "10.5067/MODIS/MCD06COSP_M3_MODIS.061"; String identifier_product_doi_authority "http://dx.doi.org/"; String ancillary_files ""; String DataCenterId "UWI-MAD/SSEC/ASIPS"; String project "NASA VIIRS Atmosphere SIPS"; String creator_name "NASA VIIRS Atmosphere SIPS"; String creator_url "https://sips.ssec.wisc.edu/"; String creator_email "[email protected]"; String creator_institution "Space Science & Engineering Center, University of Wisconsin - Madison"; String publisher_name "LAADS"; String publisher_url "https://ladsweb.modaps.eosdis.nasa.gov/"; String publisher_email "[email protected]"; String publisher_institution "NASA Level-1 and Atmosphere Archive & Distribution System"; String time_coverage_start "2002-07-01T00:00:00.000000"; String time_coverage_end "2002-07-31T23:59:59.000000"; String xmlmetadata "<?xml version="1.0"^?><!DOCTYPE GranuleMetaDataFile SYSTEM "http://ecsinfo.gsfc.nasa.gov/ECSInfo/ecsmetadata/dtds/DPL/ECS/ScienceGranuleMetadata.dtd"><GranuleMetaDataFile> <DTDVersion>1.0</DTDVersion> <DataCenterId>UWI-MAD/SSEC/ASIPS</DataCenterId> <GranuleURMetaData> <CollectionMetaData> <ShortName>MCD06COSP_M3_MODIS</ShortName> <VersionID>61</VersionID> </CollectionMetaData> <ECSDataGranule> <ReprocessingPlanned>no further reprocessing anticipated</ReprocessingPlanned> <LocalGranuleID>MCD06COSP_M3_MODIS.A2002182.061.2020181145824.nc</LocalGranuleID> <ProductionDateTime>2020-06-29 14:58:49.491586</ProductionDateTime> <LocalVersionID>61</LocalVersionID> </ECSDataGranule> <PGEVersionClass> <PGEVersion>6.1.2</PGEVersion> </PGEVersionClass> <RangeDateTime> <RangeEndingTime>23:59:59.000000</RangeEndingTime> <RangeEndingDate>2002-07-31</RangeEndingDate> <RangeBeginningTime>00:00:00.000000</RangeBeginningTime> <RangeBeginningDate>2002-07-01</RangeBeginningDate> </RangeDateTime> <SpatialDomainContainer> <HorizontalSpatialDomainContainer> <BoundingRectangle> <WestBoundingCoordinate>-180</WestBoundingCoordinate> <NorthBoundingCoordinate>90</NorthBoundingCoordinate> <EastBoundingCoordinate>180</EastBoundingCoordinate> <SouthBoundingCoordinate>-90</SouthBoundingCoordinate> </BoundingRectangle> </HorizontalSpatialDomainContainer> </SpatialDomainContainer> <Platform> <PlatformShortName>Suomi NPP</PlatformShortName> <Instrument> <InstrumentShortName>VIIRS</InstrumentShortName> <Sensor> <SensorShortName>VIIRS</SensorShortName> </Sensor> </Instrument> </Platform> <InputGranule> <InputPointer>MCD06COSP_D3_MODIS.A2002185.061.2020179074148.nc</InputPointer> <InputPointer>MCD06COSP_D3_MODIS.A2002186.061.2020179074020.nc</InputPointer> <InputPointer>MCD06COSP_D3_MODIS.A2002187.061.2020179080105.nc</InputPointer> <InputPointer>MCD06COSP_D3_MODIS.A2002188.061.2020179073800.nc</InputPointer> <InputPointer>MCD06COSP_D3_MODIS.A2002189.061.2020179075527.nc</InputPointer> <InputPointer>MCD06COSP_D3_MODIS.A2002190.061.2020181140712.nc</InputPointer> <InputPointer>MCD06COSP_D3_MODIS.A2002191.061.2020179073354.nc</InputPointer> <InputPointer>MCD06COSP_D3_MODIS.A2002192.061.2020181140657.nc</InputPointer> <InputPointer>MCD06COSP_D3_MODIS.A2002193.061.2020181140639.nc</InputPointer> <InputPointer>MCD06COSP_D3_MODIS.A2002194.061.2020181140633.nc</InputPointer> <InputPointer>MCD06COSP_D3_MODIS.A2002195.061.2020179073600.nc</InputPointer> <InputPointer>MCD06COSP_D3_MODIS.A2002196.061.2020179071759.nc</InputPointer> <InputPointer>MCD06COSP_D3_MODIS.A2002197.061.2020179073136.nc</InputPointer> <InputPointer>MCD06COSP_D3_MODIS.A2002198.061.2020181140638.nc</InputPointer> <InputPointer>MCD06COSP_D3_MODIS.A2002199.061.2020179073626.nc</InputPointer> <InputPointer>MCD06COSP_D3_MODIS.A2002200.061.2020181140632.nc</InputPointer> <InputPointer>MCD06COSP_D3_MODIS.A2002201.061.2020181140623.nc</InputPointer> <InputPointer>MCD06COSP_D3_MODIS.A2002202.061.2020179073345.nc</InputPointer> <InputPointer>MCD06COSP_D3_MODIS.A2002203.061.2020179072223.nc</InputPointer> <InputPointer>MCD06COSP_D3_MODIS.A2002204.061.2020179072036.nc</InputPointer> <InputPointer>MCD06COSP_D3_MODIS.A2002205.061.2020179074935.nc</InputPointer> <InputPointer>MCD06COSP_D3_MODIS.A2002206.061.2020179072758.nc</InputPointer> <InputPointer>MCD06COSP_D3_MODIS.A2002207.061.2020179074751.nc</InputPointer> <InputPointer>MCD06COSP_D3_MODIS.A2002208.061.2020179074110.nc</InputPointer> <InputPointer>MCD06COSP_D3_MODIS.A2002209.061.2020179073958.nc</InputPointer> <InputPointer>MCD06COSP_D3_MODIS.A2002210.061.2020181140608.nc</InputPointer> <InputPointer>MCD06COSP_D3_MODIS.A2002211.061.2020181140441.nc</InputPointer> <InputPointer>MCD06COSP_D3_MODIS.A2002212.061.2020181140457.nc</InputPointer> </InputGranule> <AncillaryInputGranules> </AncillaryInputGranules> </GranuleURMetaData></GranuleMetaDataFile>"; String platform "Aqua, Terra"; String instrument "MODIS"; String processing_level "L3"; String format "NetCDF4"; String title "Aqua/Terra MODIS Cloud Properties Level 3 monthly, 1x1 degree grid (MCD06COSP_M3_MODIS)"; String long_name "MODIS (Aqua/Terra) Cloud Properties Level 3 monthly, 1x1 degree grid"; String version_id "061"; Float64 geospatial_lat_max 90.00000000000000; Float64 geospatial_lat_min -90.00000000000000; Float64 geospatial_lon_min 180.0000000000000; Float64 geospatial_lon_max -180.0000000000000; Float64 NorthBoundingCoordinate 90.00000000000000; Float64 SouthBoundingCoordinate -90.00000000000000; Float64 EastBoundingCoordinate 180.0000000000000; Float64 WestBoundingCoordinate -180.0000000000000; Float64 latitude_resolution 1.000000000000000; Float64 longitude_resolution 1.000000000000000; String license "http://science.nasa.gov/earth-science/earth-science-data/data-information-policy/"; String stdname_vocabulary "NetCDF Climate and Forecast (CF) Metadata Convention"; String keywords_vocabulary "NASA Global Change Master Directory (GCMD) Science Keywords"; String keywords "EARTH SCIENCE > ATMOSPHERE > CLOUDS > CLOUD MICROPHYSICS > CLOUD OPTICAL DEPTH/THICKNESS, EARTH SCIENCE > ATMOSPHERE > CLOUDS > CLOUD PROPERTIES > CLOUD TOP HEIGHT, EARTH SCIENCE > ATMOSPHERE > CLOUDS > CLOUD PROPERTIES > CLOUD FRACTION"; String naming_authority "gov.nasa.gsfc.sci.atmos"; }}

And the resulting dataset has no variables:

print(ds)
<xarray.Dataset>
Dimensions:    (latitude: 180, longitude: 360)
Coordinates:
  * latitude   (latitude) float64 -89.5 -88.5 -87.5 -86.5 ... 87.5 88.5 89.5
  * longitude  (longitude) float64 -179.5 -178.5 -177.5 ... 177.5 178.5 179.5
Data variables:
    *empty*

Perhaps I am missing some essential keyword argument(s) for xr.open_dataset?

cisaacstern avatar Mar 22 '22 18:03 cisaacstern

@cisaacstern Thanks for this. I'll catch up later this week, but meanwhile, perhaps you can try with engine=netcdf4 keywords to xr.open_dataset ?

RobertPincus avatar Mar 22 '22 18:03 RobertPincus

This error message is coming from the netCDF4 C library.

import netCDF4
url = 'http://ladsweb.modaps.eosdis.nasa.gov/opendap/hyrax/allData/61/MCD06COSP_M3_MODIS/2002/182/MCD06COSP_M3_MODIS.A2002182.061.2020181145824.nc'
ds = netCDF4.Dataset(url, "r")

This means that the Hyrax server is emitting data that cannot be properly parsed by the official Unidata netCDF4 library. This is a problem with the server and needs to be brought to the attention of the NASA system administrator.

Is there a direct link to netCDF file download (rather than OPeNDAP endpoint)?

rabernat avatar Mar 22 '22 18:03 rabernat

One can access the files through a GUI by appending .dmr.html. That provides a button where one can download the data in several formats, but I haven't been able to see the underlying URLs yet.

RobertPincus avatar Mar 22 '22 19:03 RobertPincus

That website - https://ladsweb.modaps.eosdis.nasa.gov/opendap/hyrax/allData/61/MCD06COSP_M3_MODIS/2002/182/MCD06COSP_M3_MODIS.A2002182.061.2020181145824.nc.dmr.html - does not show any data variables either, just lon and lat.

image

rabernat avatar Mar 22 '22 19:03 rabernat

To get variables one has to open a group, i.e. Cloud_Optical_Thickness_Liquid

RobertPincus avatar Mar 22 '22 19:03 RobertPincus

I cannot discover any groups from that opendap url.

import netCDF4
url = 'http://ladsweb.modaps.eosdis.nasa.gov/opendap/hyrax/allData/61/MCD06COSP_M3_MODIS/2002/182/MCD06COSP_M3_MODIS.A2002182.061.2020181145824.nc'
ds = netCDF4.Dataset(url, "r")
print(ds.groups) # --> {}

Where are you inputting the group information when you access the data?

rabernat avatar Mar 22 '22 20:03 rabernat

My attention is a little split these days, sorry. Like you both, I have been unable to open the files remotely via OpenDAP. I will see what I can learn from NASA but they have not been very responsive. I will also see if I can sleuth out direct download links, which I have not been able to find anywhere obvious.

Once the files is downloaded I've been able to see data with e.g.

import array as xr
file = 'MCD06COSP_M3_MODIS.A2002182.061.2020181145824.nc'
f = xr.open_dataset(file, engine='netcdf4', group='Cloud_Mask_Fraction')

RobertPincus avatar Mar 23 '22 02:03 RobertPincus

Now the server is returning a 500 server error

https://ladsweb.modaps.eosdis.nasa.gov/opendap/hyrax/allData/61/MCD06COSP_M3_MODIS/2002/182/MCD06COSP_M3_MODIS.A2002182.061.2020181145824.nc.dmr.html

image

rabernat avatar Mar 23 '22 15:03 rabernat

The html page is back up. Inspecting the source there reveals that appending .dap.nc4 is the path to direct download:

<input type="button" value="Get as NetCDF 4" onclick="getAs_button_action('NetCDF-4 Data', '.dap.nc4')">

Amending the earlier make_url function accordingly, I can now download the source files. What appear to be the group names ('Cloud_Mask_Fraction', etc.) are discoverable in the dataset's ds.YAML_config attribute, but none of these names are openable as groups using the syntax provided in https://github.com/pangeo-forge/staged-recipes/issues/125#issuecomment-1075838942:

import fsspec
import pandas as pd
import requests
import xarray as xr
import yaml

BASE_URL = "http://ladsweb.modaps.eosdis.nasa.gov"
DATASET_ID = "61/MCD06COSP_M3_MODIS"

dates = pd.date_range("2002-07-01", "2021-07-01", freq="MS")  # "MS" for "month start"


def make_url(date):
    """Make a NetCDF4 download url for NASA MODIS-COSP data based on an input date.
    
    :param date: A member of the ``pandas.core.indexes.datetimes.DatetimeIndex``
        created with ``dates = pd.date_range("2002-07-01", "2021-07-01", freq="MS")``.
    """
    day_of_year = date.timetuple().tm_yday
    response = requests.get(
        f"{BASE_URL}/archive/allData/{DATASET_ID}/{date.year}/{day_of_year}.json"
    )
    filename = [r["name"] for r in response.json()].pop(0)
    
    return f"{BASE_URL}/opendap/hyrax/allData/{DATASET_ID}/{date.year}/{day_of_year}/{filename}.dap.nc4"


test_filename = "test.nc"

with fsspec.open(make_url(dates[0])) as src:
    with open(test_filename, mode="wb") as dst:
        dst.write(src.read())

ds = xr.open_dataset(test_filename, engine='netcdf4')
yaml_config = yaml.safe_load(ds.YAML_config)
group_name_pairs = [(v["name_in"], v["name_out"]) for v in yaml_config["variable_settings"]]

for pair in group_name_pairs:
    for group in pair:
        try:
            ds = xr.open_dataset(test_filename, engine='netcdf4', group=group)
        except OSError as e:
            print(e)
[Errno group not found: Solar_Zenith] 'Solar_Zenith'
[Errno group not found: Solar_Zenith] 'Solar_Zenith'
[Errno group not found: Solar_Azimuth] 'Solar_Azimuth'
[Errno group not found: Solar_Azimuth] 'Solar_Azimuth'
[Errno group not found: Sensor_Zenith] 'Sensor_Zenith'
[Errno group not found: Sensor_Zenith] 'Sensor_Zenith'
[Errno group not found: Sensor_Azimuth] 'Sensor_Azimuth'
[Errno group not found: Sensor_Azimuth] 'Sensor_Azimuth'
[Errno group not found: Cloud_Top_Pressure] 'Cloud_Top_Pressure'
[Errno group not found: Cloud_Top_Pressure] 'Cloud_Top_Pressure'
[Errno group not found: Cloud_Fraction] 'Cloud_Fraction'
[Errno group not found: Cloud_Mask_Fraction] 'Cloud_Mask_Fraction'
[Errno group not found: Cloud_Fraction] 'Cloud_Fraction'
[Errno group not found: Cloud_Mask_Fraction_Low] 'Cloud_Mask_Fraction_Low'
[Errno group not found: Cloud_Fraction] 'Cloud_Fraction'
[Errno group not found: Cloud_Mask_Fraction_Mid] 'Cloud_Mask_Fraction_Mid'
[Errno group not found: Cloud_Fraction] 'Cloud_Fraction'
[Errno group not found: Cloud_Mask_Fraction_High] 'Cloud_Mask_Fraction_High'
[Errno group not found: Cloud_Optical_Thickness] 'Cloud_Optical_Thickness'
[Errno group not found: Cloud_Optical_Thickness_Liquid] 'Cloud_Optical_Thickness_Liquid'
[Errno group not found: Cloud_Optical_Thickness] 'Cloud_Optical_Thickness'
[Errno group not found: Cloud_Optical_Thickness_Ice] 'Cloud_Optical_Thickness_Ice'
[Errno group not found: Cloud_Optical_Thickness] 'Cloud_Optical_Thickness'
[Errno group not found: Cloud_Optical_Thickness_Total] 'Cloud_Optical_Thickness_Total'
[Errno group not found: Cloud_Optical_Thickness_PCL] 'Cloud_Optical_Thickness_PCL'
[Errno group not found: Cloud_Optical_Thickness_PCL_Total] 'Cloud_Optical_Thickness_PCL_Total'
[Errno group not found: Cloud_Optical_Thickness_Log] 'Cloud_Optical_Thickness_Log'
[Errno group not found: Cloud_Optical_Thickness_Log10_Liquid] 'Cloud_Optical_Thickness_Log10_Liquid'
[Errno group not found: Cloud_Optical_Thickness_Log] 'Cloud_Optical_Thickness_Log'
[Errno group not found: Cloud_Optical_Thickness_Log10_Ice] 'Cloud_Optical_Thickness_Log10_Ice'
[Errno group not found: Cloud_Optical_Thickness_Log] 'Cloud_Optical_Thickness_Log'
[Errno group not found: Cloud_Optical_Thickness_Log10_Total] 'Cloud_Optical_Thickness_Log10_Total'
[Errno group not found: Cloud_Effective_Radius] 'Cloud_Effective_Radius'
[Errno group not found: Cloud_Particle_Size_Liquid] 'Cloud_Particle_Size_Liquid'
[Errno group not found: Cloud_Effective_Radius] 'Cloud_Effective_Radius'
[Errno group not found: Cloud_Particle_Size_Ice] 'Cloud_Particle_Size_Ice'
[Errno group not found: Cloud_Water_Path] 'Cloud_Water_Path'
[Errno group not found: Cloud_Water_Path_Liquid] 'Cloud_Water_Path_Liquid'
[Errno group not found: Cloud_Water_Path] 'Cloud_Water_Path'
[Errno group not found: Cloud_Water_Path_Ice] 'Cloud_Water_Path_Ice'
[Errno group not found: COPR_Liquid] 'COPR_Liquid'
[Errno group not found: Cloud_Retrieval_Fraction_Liquid] 'Cloud_Retrieval_Fraction_Liquid'
[Errno group not found: COPR_Ice] 'COPR_Ice'
[Errno group not found: Cloud_Retrieval_Fraction_Ice] 'Cloud_Retrieval_Fraction_Ice'
[Errno group not found: COPR_Combined] 'COPR_Combined'
[Errno group not found: Cloud_Retrieval_Fraction_Total] 'Cloud_Retrieval_Fraction_Total'

Here's the full YAML config:

{'grid_settings': {'gridsize': 1,
  'projection': 'conformal',
  'lat_in': 'Latitude',
  'lon_in': 'Longitude',
  'lat_out': 'Latitude',
  'lon_out': 'Longitude',
  'fill_value': -999},
 'variable_settings': [{'name_in': 'Solar_Zenith',
   'name_out': 'Solar_Zenith',
   'attributes': [{'name': 'long_name',
     'value': 'Solar Zenith Angle (Cell to Sun) for Daytime Scenes'},
    {'name': 'units', 'value': 'degrees'},
    {'name': '_FillValue', 'value': -999.0},
    {'name': 'valid_min', 'value': 0.0},
    {'name': 'valid_max', 'value': 180.0},
    {'name': 'scale_factor', 'value': 1.0},
    {'name': 'add_offset', 'value': 0.0}],
   'masks': ['Mask_Day']},
  {'name_in': 'Solar_Azimuth',
   'name_out': 'Solar_Azimuth',
   'attributes': [{'name': 'long_name',
     'value': 'Solar Azimuth Angle (Cell to Sun) for Daytime Scenes'},
    {'name': 'units', 'value': 'degrees'},
    {'name': '_FillValue', 'value': -999.0},
    {'name': 'valid_min', 'value': -180.0},
    {'name': 'valid_max', 'value': 180.0},
    {'name': 'scale_factor', 'value': 1.0},
    {'name': 'add_offset', 'value': 0.0}],
   'masks': ['Mask_Day']},
  {'name_in': 'Sensor_Zenith',
   'name_out': 'Sensor_Zenith',
   'attributes': [{'name': 'long_name',
     'value': 'Sensor Zenith Angle (Cell to Sensor) for Daytime Scenes'},
    {'name': 'units', 'value': 'degrees'},
    {'name': '_FillValue', 'value': -999.0},
    {'name': 'valid_min', 'value': 0.0},
    {'name': 'valid_max', 'value': 180.0},
    {'name': 'scale_factor', 'value': 1.0},
    {'name': 'add_offset', 'value': 0.0}],
   'masks': ['Mask_Day']},
  {'name_in': 'Sensor_Azimuth',
   'name_out': 'Sensor_Azimuth',
   'attributes': [{'name': 'long_name',
     'value': 'Sensor Azimuth Angle (Cell to Sensor) for Daytime Scenes'},
    {'name': 'units', 'value': 'degrees'},
    {'name': '_FillValue', 'value': -999.0},
    {'name': 'valid_min', 'value': -180.0},
    {'name': 'valid_max', 'value': 180.0},
    {'name': 'scale_factor', 'value': 1.0},
    {'name': 'add_offset', 'value': 0.0}],
   'masks': ['Mask_Day']},
  {'name_in': 'Cloud_Top_Pressure',
   'name_out': 'Cloud_Top_Pressure',
   'attributes': [{'name': 'long_name',
     'value': 'Cloud Top Pressure for Daytime Scenes'},
    {'name': 'units', 'value': 'mb'},
    {'name': '_FillValue', 'value': -999.0},
    {'name': 'valid_min', 'value': 1.0},
    {'name': 'valid_max', 'value': 1100.0},
    {'name': 'scale_factor', 'value': 1.0},
    {'name': 'add_offset', 'value': 0.0}],
   'masks': ['Mask_Day']},
  {'name_in': 'Cloud_Fraction',
   'name_out': 'Cloud_Mask_Fraction',
   'attributes': [{'name': 'long_name',
     'value': 'Cloud Fraction from Cloud Mask for Daytime Scenes'},
    {'name': 'units', 'value': 'none'},
    {'name': '_FillValue', 'value': -999.0},
    {'name': 'valid_min', 'value': 0.0},
    {'name': 'valid_max', 'value': 1.0},
    {'name': 'scale_factor', 'value': 1.0},
    {'name': 'add_offset', 'value': 0.0}],
   'masks': ['Mask_Day']},
  {'name_in': 'Cloud_Fraction',
   'name_out': 'Cloud_Mask_Fraction_Low',
   'attributes': [{'name': 'long_name',
     'value': 'Cloud Fraction from Cloud Mask (Low, CTP GE 680 hPa) for Daytime Scenes'},
    {'name': 'units', 'value': 'none'},
    {'name': '_FillValue', 'value': -999.0},
    {'name': 'valid_min', 'value': 0.0},
    {'name': 'valid_max', 'value': 1.0},
    {'name': 'scale_factor', 'value': 1.0},
    {'name': 'add_offset', 'value': 0.0}],
   'masks': ['Mask_Day', 'Mask_Low']},
  {'name_in': 'Cloud_Fraction',
   'name_out': 'Cloud_Mask_Fraction_Mid',
   'attributes': [{'name': 'long_name',
     'value': 'Cloud Fraction from Cloud Mask (Mid, 680 hPa GT CTP GE 440 hPa) for Daytime Scenes'},
    {'name': 'units', 'value': 'none'},
    {'name': '_FillValue', 'value': -999.0},
    {'name': 'valid_min', 'value': 0.0},
    {'name': 'valid_max', 'value': 1.0},
    {'name': 'scale_factor', 'value': 1.0},
    {'name': 'add_offset', 'value': 0.0}],
   'masks': ['Mask_Day', 'Mask_Middle']},
  {'name_in': 'Cloud_Fraction',
   'name_out': 'Cloud_Mask_Fraction_High',
   'attributes': [{'name': 'long_name',
     'value': 'Cloud Fraction from Cloud Mask (High, CTP LT 440 hPa) for Daytime Scenes'},
    {'name': 'units', 'value': 'none'},
    {'name': '_FillValue', 'value': -999.0},
    {'name': 'valid_min', 'value': 0.0},
    {'name': 'valid_max', 'value': 1.0},
    {'name': 'scale_factor', 'value': 1.0},
    {'name': 'add_offset', 'value': 0.0}],
   'masks': ['Mask_Day', 'Mask_High']},
  {'name_in': 'Cloud_Optical_Thickness',
   'name_out': 'Cloud_Optical_Thickness_Liquid',
   'attributes': [{'name': 'long_name',
     'value': 'Cloud Optical Thickness for Liquid Water Clouds (3.7 micron Retrieval for Cloudy Scenes)'},
    {'name': 'units', 'value': 'none'},
    {'name': '_FillValue', 'value': -999.0},
    {'name': 'valid_min', 'value': 0.0},
    {'name': 'valid_max', 'value': 150.0},
    {'name': 'scale_factor', 'value': 1.0},
    {'name': 'add_offset', 'value': 0.0}],
   '2D_histograms': [{'name_out': 'JHisto_vs_Cloud_Particle_Size_Liquid',
     'primary_var': {'edges': [0.0, 0.3, 1.3, 3.6, 9.4, 23.0, 60.0, 150.0]},
     'joint_var': {'name_in': 'Cloud_Effective_Radius',
      'edges': [4.0, 8.0, 10.0, 13.0, 15.0, 20.0, 30.0]}}],
   'masks': ['Mask_Valid_Range_CER', 'Mask_Liquid_Water_Phase_Clouds']},
  {'name_in': 'Cloud_Optical_Thickness',
   'name_out': 'Cloud_Optical_Thickness_Ice',
   'attributes': [{'name': 'long_name',
     'value': 'Cloud Optical Thickness for Ice Clouds (3.7 micron Retrieval for Cloudy Scenes)'},
    {'name': 'units', 'value': 'none'},
    {'name': '_FillValue', 'value': -999.0},
    {'name': 'valid_min', 'value': 0.0},
    {'name': 'valid_max', 'value': 150.0},
    {'name': 'scale_factor', 'value': 1.0},
    {'name': 'add_offset', 'value': 0.0}],
   '2D_histograms': [{'name_out': 'JHisto_vs_Cloud_Particle_Size_Ice',
     'primary_var': {'edges': [0.0, 0.3, 1.3, 3.6, 9.4, 23.0, 60.0, 150.0]},
     'joint_var': {'name_in': 'Cloud_Effective_Radius',
      'edges': [5.0, 10.0, 20.0, 30.0, 40.0, 50.0, 60.0]}}],
   'masks': ['Mask_Ice_Phase_Clouds']},
  {'name_in': 'Cloud_Optical_Thickness',
   'name_out': 'Cloud_Optical_Thickness_Total',
   'attributes': [{'name': 'long_name',
     'value': 'Cloud Optical Thickness for Combined (LiquidWater+Ice+Undetermined) Phase Clouds (3.7 micron Retrieval for Cloudy Scenes)'},
    {'name': 'units', 'value': 'none'},
    {'name': '_FillValue', 'value': -999.0},
    {'name': 'valid_min', 'value': 0.0},
    {'name': 'valid_max', 'value': 150.0},
    {'name': 'scale_factor', 'value': 1.0},
    {'name': 'add_offset', 'value': 0.0}],
   '2D_histograms': [{'name_out': 'JHisto_vs_Cloud_Top_Pressure',
     'primary_var': {'edges': [0.0, 0.3, 1.3, 3.6, 9.4, 23.0, 60.0, 150.0]},
     'joint_var': {'name_in': 'Cloud_Top_Pressure',
      'edges': [0.0, 180.0, 310.0, 440.0, 560.0, 680.0, 800.0, 10000.0]}}],
   'masks': ['Mask_Valid_Range_CER', 'Mask_Combined_Phase_Clouds']},
  {'name_in': 'Cloud_Optical_Thickness_PCL',
   'name_out': 'Cloud_Optical_Thickness_PCL_Total',
   'only_histograms': None,
   'attributes': [{'name': 'long_name',
     'value': 'Cloud Optical Thickness for Combined (LiquidWater+Ice+Undetermined) Phase Clouds (3.7 micron Retrieval for Partly Cloudy (PCL) Scenes)'},
    {'name': 'units', 'value': 'none'},
    {'name': '_FillValue', 'value': -999.0},
    {'name': 'valid_min', 'value': 0.0},
    {'name': 'valid_max', 'value': 150.0},
    {'name': 'scale_factor', 'value': 1.0},
    {'name': 'add_offset', 'value': 0.0}],
   '2D_histograms': [{'name_out': 'JHisto_vs_Cloud_Top_Pressure',
     'primary_var': {'edges': [0.0, 0.3, 1.3, 3.6, 9.4, 23.0, 60.0, 150.0]},
     'joint_var': {'name_in': 'Cloud_Top_Pressure',
      'edges': [0.0, 180.0, 310.0, 440.0, 560.0, 680.0, 800.0, 10000.0]}}],
   'masks': ['Mask_Valid_Range_CERPCL', 'Mask_Combined_Phase_Clouds']},
  {'name_in': 'Cloud_Optical_Thickness_Log',
   'name_out': 'Cloud_Optical_Thickness_Log10_Liquid',
   'attributes': [{'name': 'long_name',
     'value': 'Cloud Optical Thickness Log10 for Liquid Water Clouds (3.7 micron Retrieval for Cloudy Scenes)'},
    {'name': 'units', 'value': 'none'},
    {'name': '_FillValue', 'value': -999.0},
    {'name': 'valid_min', 'value': -2.0},
    {'name': 'valid_max', 'value': 2.176},
    {'name': 'scale_factor', 'value': 1.0},
    {'name': 'add_offset', 'value': 0.0}],
   'masks': ['Mask_Valid_Range_CER', 'Mask_Liquid_Water_Phase_Clouds']},
  {'name_in': 'Cloud_Optical_Thickness_Log',
   'name_out': 'Cloud_Optical_Thickness_Log10_Ice',
   'attributes': [{'name': 'long_name',
     'value': 'Cloud Optical Thickness Log10 for Ice Clouds (3.7 micron Retrieval for Cloudy Scenes)'},
    {'name': 'units', 'value': 'none'},
    {'name': '_FillValue', 'value': -999.0},
    {'name': 'valid_min', 'value': -2.0},
    {'name': 'valid_max', 'value': 2.176},
    {'name': 'scale_factor', 'value': 1.0},
    {'name': 'add_offset', 'value': 0.0}],
   'masks': ['Mask_Ice_Phase_Clouds']},
  {'name_in': 'Cloud_Optical_Thickness_Log',
   'name_out': 'Cloud_Optical_Thickness_Log10_Total',
   'attributes': [{'name': 'long_name',
     'value': 'Cloud Optical Thickness Log10 for Combined (LiquidWater+Ice+Undetermined) Phase Clouds (3.7 micron Retrieval for Cloudy Scenes)'},
    {'name': 'units', 'value': 'none'},
    {'name': '_FillValue', 'value': -999.0},
    {'name': 'valid_min', 'value': -2.0},
    {'name': 'valid_max', 'value': 2.176},
    {'name': 'scale_factor', 'value': 1.0},
    {'name': 'add_offset', 'value': 0.0}],
   'masks': ['Mask_Valid_Range_CER', 'Mask_Combined_Phase_Clouds']},
  {'name_in': 'Cloud_Effective_Radius',
   'name_out': 'Cloud_Particle_Size_Liquid',
   'attributes': [{'name': 'long_name',
     'value': 'Cloud Effective Radius for Liquid Water Clouds (3.7 micron Retrieval for Cloudy Scenes)'},
    {'name': 'units', 'value': 'microns'},
    {'name': '_FillValue', 'value': -999.0},
    {'name': 'valid_min', 'value': 4.0},
    {'name': 'valid_max', 'value': 30.0},
    {'name': 'scale_factor', 'value': 1.0},
    {'name': 'add_offset', 'value': 0.0}],
   'masks': ['Mask_Valid_Range_CER', 'Mask_Liquid_Water_Phase_Clouds']},
  {'name_in': 'Cloud_Effective_Radius',
   'name_out': 'Cloud_Particle_Size_Ice',
   'attributes': [{'name': 'long_name',
     'value': 'Cloud Effective Radius for Ice Clouds (3.7 micron Retrieval for Cloudy Scenes)'},
    {'name': 'units', 'value': 'microns'},
    {'name': '_FillValue', 'value': -999.0},
    {'name': 'valid_min', 'value': 5.0},
    {'name': 'valid_max', 'value': 60.0},
    {'name': 'scale_factor', 'value': 1.0},
    {'name': 'add_offset', 'value': 0.0}],
   'masks': ['Mask_Ice_Phase_Clouds']},
  {'name_in': 'Cloud_Water_Path',
   'name_out': 'Cloud_Water_Path_Liquid',
   'attributes': [{'name': 'long_name',
     'value': 'Cloud Water Path for Liquid Water Clouds (3.7 micron Retrieval for Cloudy Scenes)'},
    {'name': 'units', 'value': 'g/m^2'},
    {'name': '_FillValue', 'value': -999.0},
    {'name': 'valid_min', 'value': 0.0},
    {'name': 'valid_max', 'value': 3000.0},
    {'name': 'scale_factor', 'value': 1.0},
    {'name': 'add_offset', 'value': 0.0}],
   'masks': ['Mask_Valid_Range_CER', 'Mask_Liquid_Water_Phase_Clouds']},
  {'name_in': 'Cloud_Water_Path',
   'name_out': 'Cloud_Water_Path_Ice',
   'attributes': [{'name': 'long_name',
     'value': 'Cloud Water Path for Ice Clouds (3.7 micron Retrieval for Cloudy Scenes)'},
    {'name': 'units', 'value': 'g/m^2'},
    {'name': '_FillValue', 'value': -999.0},
    {'name': 'valid_min', 'value': 0.0},
    {'name': 'valid_max', 'value': 6000.0},
    {'name': 'scale_factor', 'value': 1.0},
    {'name': 'add_offset', 'value': 0.0}],
   'masks': ['Mask_Ice_Phase_Clouds']},
  {'name_in': 'COPR_Liquid',
   'name_out': 'Cloud_Retrieval_Fraction_Liquid',
   'attributes': [{'name': 'long_name',
     'value': 'Cloud Optical Properties Retrieval Fraction (Liquid Water Clouds)'},
    {'name': 'units', 'value': 'none'},
    {'name': '_FillValue', 'value': -999.0},
    {'name': 'valid_min', 'value': 0.0},
    {'name': 'valid_max', 'value': 1.0},
    {'name': 'scale_factor', 'value': 1.0},
    {'name': 'add_offset', 'value': 0.0}]},
  {'name_in': 'COPR_Ice',
   'name_out': 'Cloud_Retrieval_Fraction_Ice',
   'attributes': [{'name': 'long_name',
     'value': 'Cloud Optical Properties Retrieval Fraction (Ice Clouds)'},
    {'name': 'units', 'value': 'none'},
    {'name': '_FillValue', 'value': -999.0},
    {'name': 'valid_min', 'value': 0.0},
    {'name': 'valid_max', 'value': 1.0},
    {'name': 'scale_factor', 'value': 1.0},
    {'name': 'add_offset', 'value': 0.0}]},
  {'name_in': 'COPR_Combined',
   'name_out': 'Cloud_Retrieval_Fraction_Total',
   'attributes': [{'name': 'long_name',
     'value': 'Cloud Optical Properties Retrieval Fraction (Combined (LiquidWater+Ice+Undetermined) Phase Clouds)'},
    {'name': 'units', 'value': 'none'},
    {'name': '_FillValue', 'value': -999.0},
    {'name': 'valid_min', 'value': 0.0},
    {'name': 'valid_max', 'value': 1.0},
    {'name': 'scale_factor', 'value': 1.0},
    {'name': 'add_offset', 'value': 0.0}]}]}

cisaacstern avatar Mar 23 '22 19:03 cisaacstern

@cisaacstern This part of the code is supposed to create a copy?

with fsspec.open(make_url(dates[0])) as src:
    with open(test_filename, mode="wb") as dst:
        dst.write(src.read())

Because the file created is much smaller than the original:

 % ls -lt *.nc
-rw-r--r--@ 1 robert  staff      47668 Mar 23 15:32 test.nc
-rw-r--r--@ 1 robert  staff   40091481 Sep 21  2021 MCD06COSP_M3_MODIS.A2021182.061.2021250210032.nc

RobertPincus avatar Mar 23 '22 19:03 RobertPincus

Yes, that's the code block which aims to download the file.

How did you get this 40 MB MCD06COSP_M3_MODIS.A2021182.061.2021250210032.nc?

When I navigate to the GUI at

https://ladsweb.modaps.eosdis.nasa.gov/opendap/hyrax/allData/61/MCD06COSP_M3_MODIS/2002/182/MCD06COSP_M3_MODIS.A2002182.061.2020181145824.nc.dmr.html

and click Get as NetCDF 4 the file my web browser downloads is 47668 bytes

➜  Downloads ls -lt *.nc4
-rw-r--r--@ 1 charlesstern  staff  47668 Mar 23 12:56 MCD06COSP_M3_MODIS.A2002182.061.2020181145824.nc.nc4

which is the same size as the test.nc retrieved by that code block.

Looks like your 40 MB MCD06COSP_M3_MODIS.A2021182.061.2021250210032.nc was downloaded last September? Perhaps this Hyrax server is truly just not working right now, as Ryan previously hypothesized?

cisaacstern avatar Mar 23 '22 20:03 cisaacstern

... hmm on closer reading your file has an updated_at slug of 2021250210032 whereas somehow I'm pointing at 2020181145824 which is an older version... I'm going to look into that now.

cisaacstern avatar Mar 23 '22 20:03 cisaacstern

The better comparison would be to

-rw-r--r--@ 1 robert  staff  68513011 Mar  1 14:00 MCD06COSP_M3_MODIS.A2021182.061.2022052174444.nc

RobertPincus avatar Mar 23 '22 20:03 RobertPincus

Thanks for these helpful clarifications re: expected data size, Robert. I've made considerable headway with both file retrieval and a draft of the recipes themselves. Buckle up for a longish but hopefully useful post.

Exploring the LAADS DAAC website a bit turned up the HTTP file service, e.g.

https://ladsweb.modaps.eosdis.nasa.gov/archive/allData/61/MCD06COSP_M3_MODIS/2002/182/

demonstrates a wget example using the authentication option

wget ... --header "Authorization: Bearer INSERT_DOWNLOAD_TOKEN_HERE"

After generating a token according to these instructions and exporting it as the EARTHDATA_TOKEN env variable, this authentication style can be adapted to download a complete file via fsspec as follows

import os
import fsspec

base_url = (
    "https://ladsweb.modaps.eosdis.nasa.gov/"
    "archive/allData/61/MCD06COSP_M3_MODIS/2002/182"
)
filename = "MCD06COSP_M3_MODIS.A2002182.061.2020181145824.nc"

with fsspec.open(
    f"{base_url}/{filename}",
    client_kwargs=dict(headers=dict(Authorization=f"Bearer {os.environ['EARTHDATA_TOKEN']}")),
) as src:
    with open(filename, mode="wb") as dst:
        dst.write(src.read())

The resulting file is has an openable group for each of the names provided in its ds.YAML_config

import xarray as xr
import yaml

ds = xr.open_dataset(filename)
yaml_config = yaml.safe_load(ds.YAML_config)
group_names = [v["name_out"] for v in yaml_config["variable_settings"]]

has_groups = []
for group in group_names:
    try:
        ds = xr.open_dataset(filename, group=group)
    except OSError as e:
        print(e)
    else:
        has_groups.append(group)

print(has_groups)
['Solar_Zenith', 'Solar_Azimuth', 'Sensor_Zenith', 'Sensor_Azimuth', 'Cloud_Top_Pressure', 'Cloud_Mask_Fraction', 'Cloud_Mask_Fraction_Low', 'Cloud_Mask_Fraction_Mid', 'Cloud_Mask_Fraction_High', 'Cloud_Optical_Thickness_Liquid', 'Cloud_Optical_Thickness_Ice', 'Cloud_Optical_Thickness_Total', 'Cloud_Optical_Thickness_PCL_Total', 'Cloud_Optical_Thickness_Log10_Liquid', 'Cloud_Optical_Thickness_Log10_Ice', 'Cloud_Optical_Thickness_Log10_Total', 'Cloud_Particle_Size_Liquid', 'Cloud_Particle_Size_Ice', 'Cloud_Water_Path_Liquid', 'Cloud_Water_Path_Ice', 'Cloud_Retrieval_Fraction_Liquid', 'Cloud_Retrieval_Fraction_Ice', 'Cloud_Retrieval_Fraction_Total']

With this file access knowledge in hand, we can write a dictionary containing a naive XarrayZarrRecipe for each group as follows.

Note: Each of these recipes concatenates the given group into a time series spanning all months covered in the dates sequence. To make this possible, I define a process_input function which adds the "date" dimension to each group, because as provided by LAADS DAAC the groups do not have any temporal dimension along which to concatenate.

import os

import pandas as pd
import requests

from pangeo_forge_recipes.patterns import ConcatDim, FilePattern
from pangeo_forge_recipes.recipes import XarrayZarrRecipe

GROUPS = [
    'Solar_Zenith',
    'Solar_Azimuth',
    'Sensor_Zenith',
    'Sensor_Azimuth',
    'Cloud_Top_Pressure',
    'Cloud_Mask_Fraction',
    'Cloud_Mask_Fraction_Low',
    'Cloud_Mask_Fraction_Mid',
    'Cloud_Mask_Fraction_High',
    'Cloud_Optical_Thickness_Liquid',
    'Cloud_Optical_Thickness_Ice',
    'Cloud_Optical_Thickness_Total',
    'Cloud_Optical_Thickness_PCL_Total',
    'Cloud_Optical_Thickness_Log10_Liquid',
    'Cloud_Optical_Thickness_Log10_Ice',
    'Cloud_Optical_Thickness_Log10_Total',
    'Cloud_Particle_Size_Liquid',
    'Cloud_Particle_Size_Ice',
    'Cloud_Water_Path_Liquid',
    'Cloud_Water_Path_Ice',
    'Cloud_Retrieval_Fraction_Liquid',
    'Cloud_Retrieval_Fraction_Ice',
    'Cloud_Retrieval_Fraction_Total',
]

BASE_URL = "https://ladsweb.modaps.eosdis.nasa.gov/archive/allData/61/MCD06COSP_M3_MODIS"

dates = pd.date_range("2002-07-01", "2021-07-01", freq="MS")  # "MS" for "month start"

concat_dim = ConcatDim("date", keys=dates, nitems_per_file=1)


def make_url(date):
    """Make a NetCDF4 download url for NASA MODIS-COSP data based on an input date.
    
    :param date: A member of the ``pandas.core.indexes.datetimes.DatetimeIndex``
        created with ``dates = pd.date_range("2002-07-01", "2021-07-01", freq="MS")``.
    """
    day_of_year = date.timetuple().tm_yday
    response = requests.get(f"{BASE_URL}/{date.year}/{day_of_year}.json")
    filename = [r["name"] for r in response.json()].pop(0)
    
    return f"{BASE_URL}/{date.year}/{day_of_year}/{filename}"


pattern = FilePattern(
    make_url,
    concat_dim,
    fsspec_open_kwargs={
        "client_kwargs": dict(headers=dict(Authorization=f"Bearer {os.environ['EARTHDATA_TOKEN']}"))
    },
)


def process_input(ds, filename):
    """Add missing "date" dimension to dataset to facilitate concatenation.
    """
    import xarray as xr
    
    return xr.concat([ds], dim="date")


per_group_recipes = {
    group: XarrayZarrRecipe(
        pattern,
        xarray_open_kwargs=dict(group=group),
        process_input=process_input,
    )
    for group in GROUPS
}

We cannot execute these recipes on Pangeo Forge Cloud yet, because we don't yet have a mechanism to securely manage credentials (xref https://github.com/pangeo-forge/roadmap/pull/36). However, I did execute a 2-month temporal subset of each of these recipes locally (and anyone else can too) with the following code:

NOTE: Running the code below will create 23 new subdirectories (i.e. Zarr stores, which are directories) within the current working directory.

from fsspec.implementations.local import LocalFileSystem
from pangeo_forge_recipes.recipes import setup_logging
from pangeo_forge_recipes.storage import CacheFSSpecTarget, FSSpecTarget

fs_local = LocalFileSystem()

setup_logging("DEBUG")

for group_name, recipe in per_group_recipes.items():
    print(f"\n\n Building {group_name} onto local storage...")
    recipe.storage_config.cache = CacheFSSpecTarget(fs_local, "cache")
    recipe.storage_config.target = FSSpecTarget(fs_local, group_name + ".zarr")

    recipe_pruned = recipe.copy_pruned()
    recipe_pruned.to_function()()

and the resulting Zarr stores (one for each group) can be accessed with

import xarray as xr

ds = xr.open_zarr(f"{group_name}.zarr", consolidated=True)

by way of conclusion, for now, based on this test I'd estimate the full temporal scope of each of these recipes to build Zarr stores of between ~ 1.1 and 12.8 GB per group, with a total dataset (consisting of a full temporal run for each of the 23 groups) size of about 69 GB:

all_groups_full_size = 0
for group in GROUPS:
    ds = xr.open_zarr(f"{group}.zarr", consolidated=True)
    group_pruned_size = round(ds.nbytes/1e6)
    group_full_size = group_pruned_size * len(dates)
    print(f"{group} {group_pruned_size} MB -> {group_full_size/1e3} GB")
    all_groups_full_size += group_full_size
    
print(f"\n{all_groups_full_size/1e3} GB")
Solar_Zenith 5 MB -> 1.145 GB
Solar_Azimuth 5 MB -> 1.145 GB
Sensor_Zenith 5 MB -> 1.145 GB
Sensor_Azimuth 5 MB -> 1.145 GB
Cloud_Top_Pressure 5 MB -> 1.145 GB
Cloud_Mask_Fraction 5 MB -> 1.145 GB
Cloud_Mask_Fraction_Low 5 MB -> 1.145 GB
Cloud_Mask_Fraction_Mid 5 MB -> 1.145 GB
Cloud_Mask_Fraction_High 5 MB -> 1.145 GB
Cloud_Optical_Thickness_Liquid 49 MB -> 11.221 GB
Cloud_Optical_Thickness_Ice 49 MB -> 11.221 GB
Cloud_Optical_Thickness_Total 56 MB -> 12.824 GB
Cloud_Optical_Thickness_PCL_Total 51 MB -> 11.679 GB
Cloud_Optical_Thickness_Log10_Liquid 5 MB -> 1.145 GB
Cloud_Optical_Thickness_Log10_Ice 5 MB -> 1.145 GB
Cloud_Optical_Thickness_Log10_Total 5 MB -> 1.145 GB
Cloud_Particle_Size_Liquid 5 MB -> 1.145 GB
Cloud_Particle_Size_Ice 5 MB -> 1.145 GB
Cloud_Water_Path_Liquid 5 MB -> 1.145 GB
Cloud_Water_Path_Ice 5 MB -> 1.145 GB
Cloud_Retrieval_Fraction_Liquid 5 MB -> 1.145 GB
Cloud_Retrieval_Fraction_Ice 5 MB -> 1.145 GB
Cloud_Retrieval_Fraction_Total 5 MB -> 1.145 GB

68.7 GB

cisaacstern avatar Mar 24 '22 04:03 cisaacstern

@cisaacstern Thanks so much for continuing to work on this; it's spectacular.

I'm not sure how y'all think of things at Pangeo-forge but, from a science user's perspective, there's a lot to be gained by more targeted processing. (By way of background, for some groups we want to extract only one field of four; for other groups we want to do some arithmetic on existing fields.)

My understanding is that I should create a set of dictionary containing a set of XarrayZaarRecipies, where each process_input keyword points to the appropriate function? For example, I might have extract_selected_fields which creates a dataset from the Mean variable from a set of groups (renamed to the group name, so Cloud_Top_Pressure.Mean becomes Cloud_Top_Pressure)? And the recipes that share input files will not download the files over and over?

Is there a way to handle appending new data as it is produced, month by month?

RobertPincus avatar Mar 24 '22 12:03 RobertPincus

Question: do these groups contain variables with the same dimensions / coordinates? If so, it would make sense logically to merge them into a single dataset. (That is not possible today but would become possible with the Opener refactor.)

rabernat avatar Mar 24 '22 13:03 rabernat

All variables share location and time coordinates. I would package all the scalar fields together in a single dataset. There are also some joint histograms with the same location and time coordinates but different histogram bins. Because they don't share bin definitions, and because they're large, I had though to create separate datasets for each unique set of bin definitions.

RobertPincus avatar Mar 24 '22 13:03 RobertPincus

There is no inhenernt size limit to the zarr group, because it is not a single file. It's all about doing whatever is most convenient for the person analyzing the data. In this case, it sounds like we want just one big dataset.

As long as the dimensions use distinct names, we should be fine to merge into a single dataset. I.e. bins: 50 and bins: 70 would cause merge errors, but Cloud_Water_Path_Liquid_bins: 50 and Cloud_Retrieval_Fraction_Ice_bins: 70 would be fine.

We cannot execute these recipes on Pangeo Forge Cloud yet, because we don't yet have a mechanism to securely manage credentials

Charles, I wonder if it is worthwhile to just special case earthdata login and inject some earthdata login credentials directly into our environments. This would allow us to move forward with some of these recipes before we solve the general secrets problem.

rabernat avatar Mar 24 '22 14:03 rabernat

Yes, merging is definitely the way to go. As Ryan said, we'll need https://github.com/pangeo-forge/pangeo-forge-recipes/pull/245 to do this in a single recipe, but we can do it today in two steps, which I've done to complete the end-to-end demonstration.

  1. I exported the outputs of each of the recipes in my last comment with ds.to_netcdf and cached those files to our OSN bucket at these publicly accessible paths:

    https://ncsa.osn.xsede.org/Pangeo/pangeo-forge/modis-cosp/cache/Solar_Zenith.nc
    https://ncsa.osn.xsede.org/Pangeo/pangeo-forge/modis-cosp/cache/Solar_Azimuth.nc
    https://ncsa.osn.xsede.org/Pangeo/pangeo-forge/modis-cosp/cache/Sensor_Zenith.nc
    https://ncsa.osn.xsede.org/Pangeo/pangeo-forge/modis-cosp/cache/Sensor_Azimuth.nc
    https://ncsa.osn.xsede.org/Pangeo/pangeo-forge/modis-cosp/cache/Cloud_Top_Pressure.nc
    https://ncsa.osn.xsede.org/Pangeo/pangeo-forge/modis-cosp/cache/Cloud_Mask_Fraction.nc
    https://ncsa.osn.xsede.org/Pangeo/pangeo-forge/modis-cosp/cache/Cloud_Mask_Fraction_Low.nc
    https://ncsa.osn.xsede.org/Pangeo/pangeo-forge/modis-cosp/cache/Cloud_Mask_Fraction_Mid.nc
    https://ncsa.osn.xsede.org/Pangeo/pangeo-forge/modis-cosp/cache/Cloud_Mask_Fraction_High.nc
    https://ncsa.osn.xsede.org/Pangeo/pangeo-forge/modis-cosp/cache/Cloud_Optical_Thickness_Liquid.nc
    https://ncsa.osn.xsede.org/Pangeo/pangeo-forge/modis-cosp/cache/Cloud_Optical_Thickness_Ice.nc
    https://ncsa.osn.xsede.org/Pangeo/pangeo-forge/modis-cosp/cache/Cloud_Optical_Thickness_Total.nc
    https://ncsa.osn.xsede.org/Pangeo/pangeo-forge/modis-cosp/cache/Cloud_Optical_Thickness_PCL_Total.nc
    https://ncsa.osn.xsede.org/Pangeo/pangeo-forge/modis-cosp/cache/Cloud_Optical_Thickness_Log10_Liquid.nc
    https://ncsa.osn.xsede.org/Pangeo/pangeo-forge/modis-cosp/cache/Cloud_Optical_Thickness_Log10_Ice.nc
    https://ncsa.osn.xsede.org/Pangeo/pangeo-forge/modis-cosp/cache/Cloud_Optical_Thickness_Log10_Total.nc
    https://ncsa.osn.xsede.org/Pangeo/pangeo-forge/modis-cosp/cache/Cloud_Particle_Size_Liquid.nc
    https://ncsa.osn.xsede.org/Pangeo/pangeo-forge/modis-cosp/cache/Cloud_Particle_Size_Ice.nc
    https://ncsa.osn.xsede.org/Pangeo/pangeo-forge/modis-cosp/cache/Cloud_Water_Path_Liquid.nc
    https://ncsa.osn.xsede.org/Pangeo/pangeo-forge/modis-cosp/cache/Cloud_Water_Path_Ice.nc
    https://ncsa.osn.xsede.org/Pangeo/pangeo-forge/modis-cosp/cache/Cloud_Retrieval_Fraction_Liquid.nc
    https://ncsa.osn.xsede.org/Pangeo/pangeo-forge/modis-cosp/cache/Cloud_Retrieval_Fraction_Ice.nc
    https://ncsa.osn.xsede.org/Pangeo/pangeo-forge/modis-cosp/cache/Cloud_Retrieval_Fraction_Total.nc
    
  2. I wrote a second recipe to merge these inputs into a single Zarr store:

    from pangeo_forge_recipes.patterns import ConcatDim, FilePattern, MergeDim
    from pangeo_forge_recipes.recipes import XarrayZarrRecipe
    
    concat_dim = ConcatDim("date", keys=[0,], nitems_per_file=2)
    
    # Here `GROUPS` is the list defined in:
    # https://github.com/pangeo-forge/staged-recipes/issues/125#issuecomment-1077053600
    merge_dim = MergeDim("group", keys=GROUPS)
    
    def make_url(date, group):
        base_url = "https://ncsa.osn.xsede.org/Pangeo/pangeo-forge"
        return f"{base_url}/modis-cosp/cache/{group}.nc"
    
    def process_input(ds, filename):
        """Add a group name abbreviation to each data variable name.
        """
        group = filename.split("/modis-cosp/cache/")[-1].replace(".nc", "")
        abbreviation = (
            "".join([word[0] for word in group.split("_")])  # e.g. 'Cloud_Top_Pressure' -> 'CTP'
            if not group.startswith("S")  # special casing to disambiguate 'Solar_*' & 'Sensor_*'
            else group[:3] + group.split("_")[-1][0]  # e.g. 'Solar_Zenith' -> 'SolZ'; 'Sensor_Zenith' -> 'SenZ'
        )
        return ds.rename_vars({v: f"{abbreviation}_{v}" for v in ds.data_vars})
    
    pattern = FilePattern(make_url, concat_dim, merge_dim)
    
    recipe = XarrayZarrRecipe(pattern, process_input=process_input)
    
  3. I ran this recipe locally and then manually copied the output to our OSN bucket. The resulting Zarr store (2 time steps, 114 data variables) can be opened with:

    import fsspec
    import xarray as xr
    
    base_url = "https://ncsa.osn.xsede.org/Pangeo/pangeo-forge"
    dataset_public_url = f"{base_url}/modis-cosp/modis-cosp-demo.zarr"
    mapper = fsspec.get_mapper(dataset_public_url)
    ds = xr.open_zarr(mapper, consolidated=True)
    print(ds)
    
    <xarray.Dataset>
    Dimensions:  (date: 2, longitude: 360,
                  latitude: 180,
                  jhisto_cloud_optical_thickness_ice_7: 7,
                  jhisto_cloud_particle_size_ice_6: 6,
                  jhisto_cloud_optical_thickness_liquid_7: 7,
                  jhisto_cloud_particle_size_liquid_6: 6,
                  jhisto_cloud_optical_thickness_pcl_total_7: 7,
                  jhisto_cloud_top_pressure_7: 7,
                  jhisto_cloud_optical_thickness_total_7: 7)
    Dimensions without coordinates: date, longitude, latitude,
                                    jhisto_cloud_optical_thickness_ice_7,
                                    jhisto_cloud_particle_size_ice_6,
                                    jhisto_cloud_optical_thickness_liquid_7,
                                    jhisto_cloud_particle_size_liquid_6,
                                    jhisto_cloud_optical_thickness_pcl_total_7,
                                    jhisto_cloud_top_pressure_7,
                                    jhisto_cloud_optical_thickness_total_7
    Data variables: (12/114)
        CMFH_Mean                (date, longitude, latitude) float64 dask.array<chunksize=(2, 360, 180), meta=np.ndarray>
        CMFH_Pixel_Counts        (date, longitude, latitude) float64 dask.array<chunksize=(2, 360, 180), meta=np.ndarray>
        CMFH_Standard_Deviation  (date, longitude, latitude) float64 dask.array<chunksize=(2, 360, 180), meta=np.ndarray>
        CMFH_Sum                 (date, longitude, latitude) float64 dask.array<chunksize=(2, 360, 180), meta=np.ndarray>
        CMFH_Sum_Squares         (date, longitude, latitude) float64 dask.array<chunksize=(2, 360, 180), meta=np.ndarray>
        CMFL_Mean                (date, longitude, latitude) float64 dask.array<chunksize=(2, 360, 180), meta=np.ndarray>
        ...                      ...
        SolA_Sum_Squares         (date, longitude, latitude) float64 dask.array<chunksize=(2, 360, 180), meta=np.ndarray>
        SolZ_Mean                (date, longitude, latitude) float64 dask.array<chunksize=(2, 360, 180), meta=np.ndarray>
        SolZ_Pixel_Counts        (date, longitude, latitude) float64 dask.array<chunksize=(2, 360, 180), meta=np.ndarray>
        SolZ_Standard_Deviation  (date, longitude, latitude) float64 dask.array<chunksize=(2, 360, 180), meta=np.ndarray>
        SolZ_Sum                 (date, longitude, latitude) float64 dask.array<chunksize=(2, 360, 180), meta=np.ndarray>
        SolZ_Sum_Squares         (date, longitude, latitude) float64 dask.array<chunksize=(2, 360, 180), meta=np.ndarray>
    Attributes:
        _FillValue:    -999.0
        add_offset:    0.0
        long_name:     Cloud Optical Properties Retrieval Fraction (Combined (Liq...
        scale_factor:  1.0
        units:         none
        valid_max:     1.0
        valid_min:     0.0
    

I'll respond to the other questions/comments in another comment.

cisaacstern avatar Mar 24 '22 21:03 cisaacstern

here's a lot to be gained by more targeted processing. ... My understanding is that I should create a set of dictionary containing a set of XarrayZarrRecipes, where each process_input keyword points to the appropriate function?

Correct. As described in the API Reference, process_input functions must have the signature

def process_input(ds: xr.Dataset, filename: str) -> ds: xr.Dataset

so to use the group name (for renaming variables, etc.) within process_input, you'll need to get it either from the filename (as I did above) or perhaps ds.attrs["long_name"]. And yes, you can apply any arithmetic, etc. within this function as well, and then just return the ds as you'd like it to appear in the recipe's output dataset.

I agree that a great next step would be for you to refine the per-group recipes I prototyped in my earlier comment so that the per-group Zarr stores they output look as you'd like them to. (Merging all these together will become a lot simpler once the above-referenced refactor is complete, so we don't need to worry about that for now.)

As you go along, you can run local tests of your recipes as described in the Running a Recipe Locally docs. Once you hit a point where you have questions, rather than posting your code in comments as I've done here, I'd recommend Making a PR, which will make it easier for me to clone and work with your code.

And the recipes that share input files will not download the files over and over?

Once we've put everything together into one recipe, yes this will be true. In the interim, while we still have a single recipe for each group, that won't happen automatically, because each recipe maintains its own cache. If you get to a point where this becomes a barrier to recipe development, just let me know and I can show you some advanced config to point all of the recipes to a single cache. I'd recommend trying to execute a few recipes first before we get into that, though.

Is there a way to handle appending new data as it is produced, month by month?

This is on the roadmap (xref https://github.com/pangeo-forge/pangeo-forge-recipes/issues/37) but for now the solution for this would be to just overwrite the original dataset with an updated date range once new data is released. For this particular dataset, that does not concern me too much, because the entire dataset is less than a 100 GB, which is on the low end of what our infrastructure is designed to handle, so re-writing the whole thing should be relatively fast (a few hours, maybe).

I wonder if it is worthwhile to just special case earthdata login and inject some earthdata login credentials directly into our environments.

Yes, that's a good idea. And we may end up wanting to the same for other commonly used portals.

cisaacstern avatar Mar 24 '22 22:03 cisaacstern

I'm not sure how y'all think of things at Pangeo-forge but, from a science user's perspective, there's a lot to be gained by more targeted processing.

Last comment for now but wanted to add this because I realized I did not answer the aesthetic dimension of this question. The aim of Pangeo Forge is to produced analysis-ready, cloud-optimized (ARCO) datasets. The XarrayZarrRecipe will take care of the cloud-optimized part, but as the domain expert, we defer to you for the analysis-ready part. You should absolutely apply whatever preprocessing will make this data a dream to work with, and which will help you and other scientists minimize, or even eliminate, the latency between opening this dataset and getting started on your/their science. Our ideal world is one in which you open this dataset and breathe a sign of relief, "Ah, what a relief, this dataset is ready to go!"

cisaacstern avatar Mar 24 '22 22:03 cisaacstern

@cisaacstern I've cloned this repo and started work on my recipe, building on your generous help. A couple questions arising:

  • As you note the signature for process_inputs is process_input(ds: xr.Dataset, filename: str) -> ds: xr.Dataset. My understanding is that ds is the results of ds = xr.open_dataset(filename, **client_kwargs). Is that correct? If so, I guess it's ok to make other calls to xr.open_dataset() with different arguments within the body of process_inputs()?

  • What is the preferred way at present to loop over a collection of recipes, as you do here, in the current environment?

  • Related: is it ok to have a recipe repo contain several recipes?

RobertPincus avatar Mar 25 '22 13:03 RobertPincus

In general I would not recommend calling open_dataset from within the preprocessing function. Although I can see how that hack would be a useful hack for us to get around the fact that we cannot distinguish between different groups at the FilePattern level. So perhaps we do it for now and then refactor later once https://github.com/pangeo-forge/pangeo-forge-recipes/pull/245 is done.

  • is it ok to have a recipe repo contain several recipes?

Yes. They just have to be enumerate in meta.yaml.

Thanks for your patience with this Rob. It's very helpful for us to have willing guinea pigs. 🐹

rabernat avatar Mar 25 '22 14:03 rabernat

What is the preferred way at present to loop over a collection of recipes, as you do https://github.com/pangeo-forge/staged-recipes/issues/125#issuecomment-1077053600, in the current environment?

Everything in that linked comment should work as-is with the current release of pangeo-forge-recipes.

As I show there, generally I've found the most concise way to define a number of recipes with some overlapping kwargs and some unique kwargs is with a dictionary comprehension. But you can also just write them out, "long-hand", one at a time, which is more verbose but has the benefit of being more easily (human) readable.

For test execution of a collection of recipes, the code in that same linked comment should also work as-is, but certainly let me know if you find otherwise.

cisaacstern avatar Mar 25 '22 15:03 cisaacstern

@cisaacstern I'm coming back to this project and now have a condo environment that includes the pangeo-forge package. I'm a little unclear how the pieces of code are supposed to fit together. Looking at the other examples in this repo, it seems that recipe.py defines a single recipe that will eventually be executed with recipe.to_function()(). Your comment above goes beyond this, to define a dictionary per_group_recipes with each item being a recipe. You then execute in a loop over the dictionary elements. How would I arrange e.g. recipe.py to do this loop? I realize I could create a separate Python recipe file for each group but that seems like the long way round.

RobertPincus avatar Apr 01 '22 15:04 RobertPincus