datatree icon indicating copy to clipboard operation
datatree copied to clipboard

Add `tutorial` module for sample datasets

Open andersy005 opened this issue 1 year ago • 7 comments

  • [x] Towards #100
  • [x] Tests added
  • [x] Passes pre-commit run --all-files
  • [x] New functions/methods are listed in api.rst
  • [x] Changes are summarized in docs/source/whats-new.rst
In [1]: import datatree

In [3]: dt = datatree.tutorial.open_datatree('cesm2-lens')

In [4]: dt
Out[4]: 
DataTree('None', parent=None)
├── DataTree('ocn')
│   ├── DataTree('historical')
│   │   └── DataTree('monthly')
│   │       ├── DataTree('cmip6')
│   │       │       Dimensions:     (member_id: 1, time: 6, z_t: 1, nlat: 384, nlon: 320, d2: 2)
│   │       │       Coordinates:
│   │       │         * member_id   (member_id) object 'r10i1181p1f1'
│   │       │         * time        (time) object 1850-01-16 12:00:00 ... 1850-06-16 00:00:00
│   │       │           time_bound  (time, d2) object ...
│   │       │         * z_t         (z_t) float32 500.0
│   │       │       Dimensions without coordinates: nlat, nlon, d2
│   │       │       Data variables:
│   │       │           O2          (member_id, time, z_t, nlat, nlon) float32 ...
│   │       │       Attributes:
│   │       │           Conventions:             CF-1.0; http://www.cgd.ucar.edu/cms/eaton/netcdf...
│   │       │           calendar:                All years have exactly  365 days.
│   │       │           cell_methods:            cell_methods = time: mean ==> the variable value...
│   │       │           contents:                Diagnostic and Prognostic Variables
│   │       │           model_doi_url:           https://doi.org/10.5065/D67H1H0V
│   │       │           revision:                $Id$
│   │       │           source:                  CCSM POP2, the CCSM Ocean Component
│   │       │           start_time:              This dataset was created on 2020-07-18 at 07:26:...
│   │       │           time_period_freq:        month_1
│   │       │           intake_esm_dataset_key:  ocn/historical/monthly/cmip6
│   │       └── DataTree('smbb')
│   │               Dimensions:     (member_id: 1, time: 6, z_t: 1, nlat: 384, nlon: 320, d2: 2)
│   │               Coordinates:
│   │                 * member_id   (member_id) object 'r11i1231p1f2'
│   │                 * time        (time) object 1850-01-16 12:00:00 ... 1850-06-16 00:00:00
│   │                   time_bound  (time, d2) object ...
│   │                 * z_t         (z_t) float32 500.0
│   │               Dimensions without coordinates: nlat, nlon, d2
│   │               Data variables:
│   │                   O2          (member_id, time, z_t, nlat, nlon) float32 ...
│   │               Attributes:
│   │                   Conventions:             CF-1.0; http://www.cgd.ucar.edu/cms/eaton/netcdf...
│   │                   calendar:                All years have exactly  365 days.
│   │                   cell_methods:            cell_methods = time: mean ==> the variable value...
│   │                   contents:                Diagnostic and Prognostic Variables
│   │                   model_doi_url:           https://doi.org/10.5065/D67H1H0V
│   │                   revision:                $Id$
│   │                   source:                  CCSM POP2, the CCSM Ocean Component
│   │                   time_period_freq:        month_1
│   │                   intake_esm_dataset_key:  ocn/historical/monthly/smbb
│   └── DataTree('ssp370')
│       └── DataTree('monthly')
│           ├── DataTree('cmip6')
│           │       Dimensions:     (member_id: 1, time: 6, z_t: 1, nlat: 384, nlon: 320, d2: 2)
│           │       Coordinates:
│           │         * member_id   (member_id) object 'r10i1181p1f1'
│           │         * time        (time) object 2015-01-16 12:00:00 ... 2015-06-16 00:00:00
│           │           time_bound  (time, d2) object ...
│           │         * z_t         (z_t) float32 500.0
│           │       Dimensions without coordinates: nlat, nlon, d2
│           │       Data variables:
│           │           O2          (member_id, time, z_t, nlat, nlon) float32 ...
│           │       Attributes:
│           │           Conventions:             CF-1.0; http://www.cgd.ucar.edu/cms/eaton/netcdf...
│           │           calendar:                All years have exactly  365 days.
│           │           cell_methods:            cell_methods = time: mean ==> the variable value...
│           │           contents:                Diagnostic and Prognostic Variables
│           │           model_doi_url:           https://doi.org/10.5065/D67H1H0V
│           │           revision:                $Id$
│           │           source:                  CCSM POP2, the CCSM Ocean Component
│           │           time_period_freq:        month_1
│           │           intake_esm_dataset_key:  ocn/ssp370/monthly/cmip6
│           └── DataTree('smbb')
│                   Dimensions:     (member_id: 1, time: 6, z_t: 1, nlat: 384, nlon: 320, d2: 2)
│                   Coordinates:
│                     * member_id   (member_id) object 'r11i1231p1f2'
│                     * time        (time) object 2015-01-16 12:00:00 ... 2015-06-16 00:00:00
│                       time_bound  (time, d2) object ...
│                     * z_t         (z_t) float32 500.0
│                   Dimensions without coordinates: nlat, nlon, d2
│                   Data variables:
│                       O2          (member_id, time, z_t, nlat, nlon) float32 ...
│                   Attributes:
│                       Conventions:             CF-1.0; http://www.cgd.ucar.edu/cms/eaton/netcdf...
│                       calendar:                All years have exactly  365 days.
│                       cell_methods:            cell_methods = time: mean ==> the variable value...
│                       contents:                Diagnostic and Prognostic Variables
│                       model_doi_url:           https://doi.org/10.5065/D67H1H0V
│                       revision:                $Id$
│                       source:                  CCSM POP2, the CCSM Ocean Component
│                       time_period_freq:        month_1
│                       intake_esm_dataset_key:  ocn/ssp370/monthly/smbb
└── DataTree('atm')
    ├── DataTree('ssp370')
    │   └── DataTree('monthly')
    │       ├── DataTree('cmip6')
    │       │       Dimensions:    (member_id: 1, time: 6, lat: 192, lon: 288, nbnd: 2)
    │       │       Coordinates:
    │       │         * lat        (lat) float64 -90.0 -89.06 -88.12 -87.17 ... 88.12 89.06 90.0
    │       │         * lon        (lon) float64 0.0 1.25 2.5 3.75 5.0 ... 355.0 356.2 357.5 358.8
    │       │         * member_id  (member_id) object 'r10i1181p1f1'
    │       │         * time       (time) object 2015-01-16 12:00:00 ... 2015-06-16 00:00:00
    │       │           time_bnds  (time, nbnd) object ...
    │       │       Dimensions without coordinates: nbnd
    │       │       Data variables:
    │       │           PRECC      (member_id, time, lat, lon) float32 ...
    │       │           TREFHT     (member_id, time, lat, lon) float32 ...
    │       │       Attributes:
    │       │           source:                  CAM
    │       │           logname:                 sunseon
    │       │           Conventions:             CF-1.0
    │       │           time_period_freq:        month_1
    │       │           host:                    mom1
    │       │           topography_file:         /mnt/lustre/share/CESM/cesm_input/atm/cam/topo/f...
    │       │           model_doi_url:           https://doi.org/10.5065/D67H1H0V
    │       │           intake_esm_dataset_key:  atm/ssp370/monthly/cmip6
    │       └── DataTree('smbb')
    │               Dimensions:    (member_id: 1, time: 6, lat: 192, lon: 288, nbnd: 2)
    │               Coordinates:
    │                 * lat        (lat) float64 -90.0 -89.06 -88.12 -87.17 ... 88.12 89.06 90.0
    │                 * lon        (lon) float64 0.0 1.25 2.5 3.75 5.0 ... 355.0 356.2 357.5 358.8
    │                 * member_id  (member_id) object 'r10i1191p1f2'
    │                 * time       (time) object 2015-01-16 12:00:00 ... 2015-06-16 00:00:00
    │                   time_bnds  (time, nbnd) object ...
    │               Dimensions without coordinates: nbnd
    │               Data variables:
    │                   PRECC      (member_id, time, lat, lon) float32 ...
    │                   TREFHT     (member_id, time, lat, lon) float32 ...
    │               Attributes:
    │                   source:                  CAM
    │                   logname:                 sunseon
    │                   Conventions:             CF-1.0
    │                   time_period_freq:        month_1
    │                   topography_file:         /mnt/lustre/share/CESM/cesm_input/atm/cam/topo/f...
    │                   model_doi_url:           https://doi.org/10.5065/D67H1H0V
    │                   intake_esm_dataset_key:  atm/ssp370/monthly/smbb
    └── DataTree('historical')
        └── DataTree('monthly')
            ├── DataTree('cmip6')
            │       Dimensions:    (member_id: 1, time: 6, lat: 192, lon: 288, nbnd: 2)
            │       Coordinates:
            │         * lat        (lat) float64 -90.0 -89.06 -88.12 -87.17 ... 88.12 89.06 90.0
            │         * lon        (lon) float64 0.0 1.25 2.5 3.75 5.0 ... 355.0 356.2 357.5 358.8
            │         * member_id  (member_id) object 'r10i1181p1f1'
            │         * time       (time) object 1850-01-16 12:00:00 ... 1850-06-16 00:00:00
            │           time_bnds  (time, nbnd) object ...
            │       Dimensions without coordinates: nbnd
            │       Data variables:
            │           PRECC      (member_id, time, lat, lon) float32 ...
            │           TREFHT     (member_id, time, lat, lon) float32 ...
            │       Attributes:
            │           source:                  CAM
            │           logname:                 sunseon
            │           Conventions:             CF-1.0
            │           time_period_freq:        month_1
            │           NCO:                     netCDF Operators version 4.9.4 (Homepage = http:...
            │           topography_file:         /mnt/lustre/share/CESM/cesm_input/atm/cam/topo/f...
            │           model_doi_url:           https://doi.org/10.5065/D67H1H0V
            │           intake_esm_dataset_key:  atm/historical/monthly/cmip6
            └── DataTree('smbb')
                    Dimensions:    (member_id: 1, time: 6, lat: 192, lon: 288, nbnd: 2)
                    Coordinates:
                      * lat        (lat) float64 -90.0 -89.06 -88.12 -87.17 ... 88.12 89.06 90.0
                      * lon        (lon) float64 0.0 1.25 2.5 3.75 5.0 ... 355.0 356.2 357.5 358.8
                      * member_id  (member_id) object 'r10i1191p1f2'
                      * time       (time) object 1850-01-16 12:00:00 ... 1850-06-16 00:00:00
                        time_bnds  (time, nbnd) object ...
                    Dimensions without coordinates: nbnd
                    Data variables:
                        PRECC      (member_id, time, lat, lon) float32 ...
                        TREFHT     (member_id, time, lat, lon) float32 ...
                    Attributes:
                        source:                  CAM
                        logname:                 sunseon
                        Conventions:             CF-1.0
                        time_period_freq:        month_1
                        topography_file:         /mnt/lustre/share/CESM/cesm_input/atm/cam/topo/f...
                        model_doi_url:           https://doi.org/10.5065/D67H1H0V
                        intake_esm_dataset_key:  atm/historical/monthly/smbb

andersy005 avatar Aug 04 '22 21:08 andersy005

This looks great @andersy005 !

Code-wise I see no issues, and would be happy to merge.

The only thing I might want to change is the data itself: can we simplify it slightly? The raw data has obscure variable names (PRECC?), unneeded dimensions (e.g. member_id), extra nesting (monthly only has one entry). For tutorial data we might want to instead clean it a bit and re-upload it. If I clean it myself locally is there an easy way to replace what's in the carbonplan bucket? (Maybe we could also just put it straight in https://github.com/pydata/xarray-data too...)

TomNicholas avatar Aug 08 '22 21:08 TomNicholas

The raw data has obscure variable names (PRECC?)

that's just CESM naming convention which isn't CF-compliant. we can exclude this dataset... The CMIP6 version which includes multi models, multi experiments should suffice

unneeded dimensions (e.g. member_id), extra nesting (monthly only has one entry)

I was trying to maintain the dimensionality of the original dataset, but i can easily get rid of those.

andersy005 avatar Aug 08 '22 22:08 andersy005

I'm going to trim down the CMIP6 sample, and will add to pydata/xarray-data repository.

  • https://github.com/pydata/xarray-data/pull/27

andersy005 avatar Aug 08 '22 22:08 andersy005

@TomNicholas, here's what the CMIP sample looks like

CMIP6 Sample
DataTree('None', parent=None)
├── DataTree('CMIP')
│   ├── DataTree('CCCma')
│   │   └── DataTree('CanESM5')
│   │       └── DataTree('historical')
│   │           ├── DataTree('Amon')
│   │           │   └── DataTree('gn')
│   │           │           Dimensions:    (lat: 64, bnds: 2, lon: 128, time: 6)
│   │           │           Coordinates:
│   │           │             * lat        (lat) float64 -87.86 -85.1 -82.31 -79.53 ... 82.31 85.1 87.86
│   │           │               lat_bnds   (lat, bnds) float64 -90.0 -86.58 -86.58 ... 86.58 86.58 90.0
│   │           │             * lon        (lon) float64 0.0 2.812 5.625 8.438 ... 348.8 351.6 354.4 357.2
│   │           │               lon_bnds   (lon, bnds) float64 -1.406 1.406 1.406 ... 355.8 355.8 358.6
│   │           │             * time       (time) object 1850-01-16 12:00:00 ... 1850-06-16 00:00:00
│   │           │               time_bnds  (time, bnds) object 1850-01-01 00:00:00 ... 1850-07-01 00:00:00
│   │           │               member_id  <U8 'r1i1p1f1'
│   │           │           Dimensions without coordinates: bnds
│   │           │           Data variables:
│   │           │               pr         (time, lat, lon) float32 7.221e-07 8.962e-07 ... 1.108e-05
│   │           │           Attributes: (12/57)
│   │           │               CCCma_model_hash:            3dedf95315d603326fde4f5340dc0519d80d10c0
│   │           │               CCCma_parent_runid:          rc3-pictrl
│   │           │               CCCma_pycmor_hash:           33c30511acc319a98240633965a04ca99c26427e
│   │           │               CCCma_runid:                 rc3.1-his01
│   │           │               Conventions:                 CF-1.7 CMIP-6.2
│   │           │               YMDH_branch_time_in_child:   1850:01:01:00
│   │           │               ...                          ...
│   │           │               variant_label:               r1i1p1f1
│   │           │               version:                     v20190429
│   │           │               status:                      2019-10-25;created;by [email protected]
│   │           │               netcdf_tracking_ids:         hdl:21.14100/363e1ebe-46e7-43dc-9feb-a7a4a0c...
│   │           │               version_id:                  v20190429
│   │           │               intake_esm_dataset_key:      CMIP/CCCma/CanESM5/historical/Amon/gn
│   │           ├── DataTree('Lmon')
│   │           │   └── DataTree('gn')
│   │           │           Dimensions:    (time: 6, lat: 64, lon: 128, bnds: 2)
│   │           │           Coordinates:
│   │           │             * lat        (lat) float64 -87.86 -85.1 -82.31 -79.53 ... 82.31 85.1 87.86
│   │           │               lat_bnds   (lat, bnds) float64 -90.0 -86.58 -86.58 ... 86.58 86.58 90.0
│   │           │             * lon        (lon) float64 0.0 2.812 5.625 8.438 ... 348.8 351.6 354.4 357.2
│   │           │               lon_bnds   (lon, bnds) float64 -1.406 1.406 1.406 ... 355.8 355.8 358.6
│   │           │             * time       (time) object 1850-01-16 12:00:00 ... 1850-06-16 00:00:00
│   │           │               time_bnds  (time, bnds) object 1850-01-01 00:00:00 ... 1850-07-01 00:00:00
│   │           │               member_id  <U8 'r1i1p1f1'
│   │           │           Dimensions without coordinates: bnds
│   │           │           Data variables:
│   │           │               gpp        (time, lat, lon) float32 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0
│   │           │               mrso       (time, lat, lon) float32 3.76e+03 3.76e+03 3.76e+03 ... 0.0 0.0
│   │           │           Attributes: (12/53)
│   │           │               variant_label:               r1i1p1f1
│   │           │               mip_era:                     CMIP6
│   │           │               license:                     CMIP6 model data produced by The Government ...
│   │           │               contact:                     [email protected]
│   │           │               parent_variant_label:        r1i1p1f1
│   │           │               source_type:                 AOGCM
│   │           │               ...                          ...
│   │           │               realm:                       land
│   │           │               branch_time_in_child:        0.0
│   │           │               source:                      CanESM5 (2019): \naerosol: interactive\natmo...
│   │           │               initialization_index:        1
│   │           │               further_info_url:            https://furtherinfo.es-doc.org/CMIP6.CCCma.C...
│   │           │               intake_esm_dataset_key:      CMIP/CCCma/CanESM5/historical/Lmon/gn
│   │           └── DataTree('Omon')
│   │               └── DataTree('gn')
│   │                       Dimensions:             (i: 360, j: 291, bnds: 2, time: 6, vertices: 4)
│   │                       Coordinates:
│   │                         * i                   (i) int32 0 1 2 3 4 5 6 ... 353 354 355 356 357 358 359
│   │                         * j                   (j) int32 0 1 2 3 4 5 6 ... 284 285 286 287 288 289 290
│   │                           latitude            (j, i) float64 -78.39 -78.39 -78.39 ... 50.23 50.01
│   │                           lev                 float64 3.047
│   │                           lev_bnds            (bnds) float64 0.0 6.194
│   │                           longitude           (j, i) float64 73.5 74.5 75.5 76.5 ... 72.95 72.96 72.99
│   │                         * time                (time) object 1850-01-16 12:00:00 ... 1850-06-16 00:0...
│   │                           time_bnds           (time, bnds) object 1850-01-01 00:00:00 ... 1850-07-0...
│   │                           member_id           <U8 'r1i1p1f1'
│   │                       Dimensions without coordinates: bnds, vertices
│   │                       Data variables:
│   │                           no3                 (time, j, i) float32 nan nan nan nan ... nan nan nan nan
│   │                           vertices_latitude   (j, i, vertices) float64 -78.29 -78.49 ... 50.11 50.11
│   │                           vertices_longitude  (j, i, vertices) float64 74.0 74.0 73.0 ... 72.95 73.0
│   │                           thetao              (time, j, i) float32 nan nan nan nan ... nan nan nan nan
│   │                       Attributes: (12/52)
│   │                           variant_label:               r1i1p1f1
│   │                           mip_era:                     CMIP6
│   │                           license:                     CMIP6 model data produced by The Government ...
│   │                           contact:                     [email protected]
│   │                           parent_variant_label:        r1i1p1f1
│   │                           source_type:                 AOGCM
│   │                           ...                          ...
│   │                           physics_index:               1
│   │                           branch_time_in_child:        0.0
│   │                           source:                      CanESM5 (2019): \naerosol: interactive\natmo...
│   │                           initialization_index:        1
│   │                           further_info_url:            https://furtherinfo.es-doc.org/CMIP6.CCCma.C...
│   │                           intake_esm_dataset_key:      CMIP/CCCma/CanESM5/historical/Omon/gn
│   ├── DataTree('MIROC')
│   │   └── DataTree('MIROC6')
│   │       └── DataTree('historical')
│   │           ├── DataTree('Lmon')
│   │           │   └── DataTree('gn')
│   │           │           Dimensions:    (lat: 128, bnds: 2, lon: 256, time: 6)
│   │           │           Coordinates:
│   │           │             * lat        (lat) float64 -88.93 -87.54 -86.14 -84.74 ... 86.14 87.54 88.93
│   │           │               lat_bnds   (lat, bnds) float64 -90.0 -88.28 -88.28 ... 88.28 88.28 90.0
│   │           │             * lon        (lon) float64 0.0 1.406 2.812 4.219 ... 354.4 355.8 357.2 358.6
│   │           │               lon_bnds   (lon, bnds) float64 -0.7031 0.7031 0.7031 ... 357.9 357.9 359.3
│   │           │             * time       (time) datetime64[ns] 1850-01-16T12:00:00 ... 1850-06-16
│   │           │               time_bnds  (time, bnds) datetime64[ns] 1850-01-01 1850-02-01 ... 1850-07-01
│   │           │               member_id  <U8 'r1i1p1f1'
│   │           │           Dimensions without coordinates: bnds
│   │           │           Data variables:
│   │           │               mrso       (time, lat, lon) float32 4.2e+03 4.2e+03 4.2e+03 ... nan nan nan
│   │           │           Attributes: (12/48)
│   │           │               Conventions:             CF-1.7 CMIP-6.2
│   │           │               activity_id:             CMIP
│   │           │               branch_method:           standard
│   │           │               branch_time_in_child:    0.0
│   │           │               branch_time_in_parent:   0.0
│   │           │               cmor_version:            3.3.2
│   │           │               ...                      ...
│   │           │               variable_id:             mrso
│   │           │               variant_label:           r1i1p1f1
│   │           │               status:                  2019-10-25;created;by [email protected]
│   │           │               netcdf_tracking_ids:     hdl:21.14100/a702781b-b6d9-4f90-a65d-c649d59a224...
│   │           │               version_id:              v20190311
│   │           │               intake_esm_dataset_key:  CMIP/MIROC/MIROC6/historical/Lmon/gn
│   │           ├── DataTree('Amon')
│   │           │   └── DataTree('gn')
│   │           │           Dimensions:    (lat: 128, bnds: 2, lon: 256, time: 6)
│   │           │           Coordinates:
│   │           │             * lat        (lat) float64 -88.93 -87.54 -86.14 -84.74 ... 86.14 87.54 88.93
│   │           │               lat_bnds   (lat, bnds) float64 -90.0 -88.28 -88.28 ... 88.28 88.28 90.0
│   │           │             * lon        (lon) float64 0.0 1.406 2.812 4.219 ... 354.4 355.8 357.2 358.6
│   │           │               lon_bnds   (lon, bnds) float64 -0.7031 0.7031 0.7031 ... 357.9 357.9 359.3
│   │           │             * time       (time) datetime64[ns] 1850-01-16T12:00:00 ... 1850-06-16
│   │           │               time_bnds  (time, bnds) datetime64[ns] 1850-01-01 1850-02-01 ... 1850-07-01
│   │           │               member_id  <U8 'r1i1p1f1'
│   │           │           Dimensions without coordinates: bnds
│   │           │           Data variables:
│   │           │               pr         (time, lat, lon) float32 2.144e-06 2.169e-06 ... 8.586e-06
│   │           │           Attributes: (12/48)
│   │           │               Conventions:             CF-1.7 CMIP-6.2
│   │           │               activity_id:             CMIP
│   │           │               branch_method:           standard
│   │           │               branch_time_in_child:    0.0
│   │           │               branch_time_in_parent:   0.0
│   │           │               cmor_version:            3.3.2
│   │           │               ...                      ...
│   │           │               variable_id:             pr
│   │           │               variant_label:           r1i1p1f1
│   │           │               status:                  2019-10-25;created;by [email protected]
│   │           │               netcdf_tracking_ids:     hdl:21.14100/61fa8b6b-e74c-4e86-9344-8ba946ee8a8...
│   │           │               version_id:              v20181212
│   │           │               intake_esm_dataset_key:  CMIP/MIROC/MIROC6/historical/Amon/gn
│   │           └── DataTree('Omon')
│   │               └── DataTree('gn')
│   │                       Dimensions:             (y: 256, x: 360, time: 6, bnds: 2, vertices: 4)
│   │                       Coordinates: (12/13)
│   │                           latitude            (y, x) float32 -88.0 -88.0 -88.0 ... 64.43 64.0 63.56
│   │                           lev                 float64 1.0
│   │                           lev_bnds            (bnds) float64 0.0 2.0
│   │                           longitude           (y, x) float32 60.5 61.5 62.5 63.5 ... 59.96 59.98 59.99
│   │                           sigma_bnds          (bnds) float64 -0.0 -0.04
│   │                         * time                (time) datetime64[ns] 1850-01-16T12:00:00 ... 1850-06-16
│   │                           ...                  ...
│   │                         * x                   (x) float64 0.5 1.5 2.5 3.5 ... 356.5 357.5 358.5 359.5
│   │                           x_bnds              (x, bnds) float64 0.0 1.0 1.0 2.0 ... 359.0 359.0 360.0
│   │                         * y                   (y) float64 -88.0 -85.75 -85.25 ... 148.6 150.5 152.4
│   │                           y_bnds              (y, bnds) float64 -90.0 -86.0 -86.0 ... 151.5 153.3
│   │                           zlev_bnds           (bnds) float64 -0.0 -2.0
│   │                           member_id           <U8 'r1i1p1f1'
│   │                       Dimensions without coordinates: bnds, vertices
│   │                       Data variables:
│   │                           depth               (y, x) float32 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0
│   │                           depth_c             float64 50.0
│   │                           eta                 (time, y, x) float32 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0
│   │                           nsigma              int32 10
│   │                           sigma               float64 -0.02
│   │                           thetao              (time, y, x) float32 nan nan nan nan ... nan nan nan nan
│   │                           vertices_latitude   (y, x, vertices) float32 -90.0 -90.0 ... 63.33 63.78
│   │                           vertices_longitude  (y, x, vertices) float32 60.0 61.0 61.0 ... 60.0 60.0
│   │                           zlev                float64 -1.0
│   │                       Attributes: (12/48)
│   │                           Conventions:             CF-1.7 CMIP-6.2
│   │                           activity_id:             CMIP
│   │                           branch_method:           standard
│   │                           branch_time_in_child:    0.0
│   │                           branch_time_in_parent:   0.0
│   │                           cmor_version:            3.3.2
│   │                           ...                      ...
│   │                           variable_id:             thetao
│   │                           variant_label:           r1i1p1f1
│   │                           status:                  2019-11-08;created;by [email protected]
│   │                           netcdf_tracking_ids:     hdl:21.14100/16598b35-19b4-49e3-98de-27b9e9444ad...
│   │                           version_id:              v20190311
│   │                           intake_esm_dataset_key:  CMIP/MIROC/MIROC6/historical/Omon/gn
│   └── DataTree('NCAR')
│       └── DataTree('CESM2-WACCM')
│           └── DataTree('historical')
│               ├── DataTree('Omon')
│               │   ├── DataTree('gr')
│               │   │       Dimensions:    (lat: 180, d2: 2, lon: 360, time: 6)
│               │   │       Coordinates:
│               │   │         * lat        (lat) float64 -89.5 -88.5 -87.5 -86.5 ... 86.5 87.5 88.5 89.5
│               │   │           lat_bnds   (lat, d2) float64 -90.0 -89.0 -89.0 -88.0 ... 88.0 89.0 89.0 90.0
│               │   │           lev        float64 0.0
│               │   │           lev_bnds   (d2) float64 0.0 5.0
│               │   │         * lon        (lon) float64 0.5 1.5 2.5 3.5 4.5 ... 356.5 357.5 358.5 359.5
│               │   │           lon_bnds   (lon, d2) float64 0.0 1.0 1.0 2.0 2.0 ... 358.0 359.0 359.0 360.0
│               │   │         * time       (time) object 1850-01-15 12:59:59.999997 ... 1850-06-15 00:00:00
│               │   │           time_bnds  (time, d2) object 1850-01-01 02:00:00.000003 ... 1850-07-01 00...
│               │   │           member_id  <U8 'r1i1p1f1'
│               │   │       Dimensions without coordinates: d2
│               │   │       Data variables:
│               │   │           no3        (time, lat, lon) float32 nan nan nan ... 0.006828 0.006827
│               │   │           thetao     (time, lat, lon) float32 nan nan nan nan ... -1.763 -1.763 -1.762
│               │   │       Attributes: (12/45)
│               │   │           variant_label:           r1i1p1f1
│               │   │           mip_era:                 CMIP6
│               │   │           license:                 CMIP6 model data produced by <The National Cente...
│               │   │           contact:                 [email protected]
│               │   │           parent_variant_label:    r1i1p1f1
│               │   │           source_type:             AOGCM BGC CHEM AER
│               │   │           ...                      ...
│               │   │           case_id:                 4
│               │   │           branch_time_in_child:    674885.0
│               │   │           source:                  CESM2 (2017): atmosphere: CAM6 (0.9x1.25 finite ...
│               │   │           initialization_index:    1
│               │   │           further_info_url:        https://furtherinfo.es-doc.org/CMIP6.NCAR.CESM2-...
│               │   │           intake_esm_dataset_key:  CMIP/NCAR/CESM2-WACCM/historical/Omon/gr
│               │   └── DataTree('gn')
│               │           Dimensions:    (nlat: 384, nlon: 320, vertices: 4, d2: 2, time: 6)
│               │           Coordinates:
│               │               lat        (nlat, nlon) float64 -79.22 -79.22 -79.22 ... 72.2 72.19 72.19
│               │               lat_bnds   (nlat, nlon, vertices) float32 -79.49 -79.49 ... 72.41 72.41
│               │               lev        float64 500.0
│               │               lev_bnds   (d2) float32 0.0 10.0
│               │               lon        (nlat, nlon) float64 320.6 321.7 322.8 ... 318.9 319.4 319.8
│               │               lon_bnds   (nlat, nlon, vertices) float32 320.0 321.1 321.1 ... 320.0 319.6
│               │             * nlat       (nlat) int32 1 2 3 4 5 6 7 8 ... 377 378 379 380 381 382 383 384
│               │             * nlon       (nlon) int32 1 2 3 4 5 6 7 8 ... 313 314 315 316 317 318 319 320
│               │             * time       (time) object 1850-01-15 13:00:00 ... 1850-06-15 00:00:00
│               │               time_bnds  (time, d2) object 1850-01-01 02:00:00.000003 ... 1850-07-01 00...
│               │               member_id  <U8 'r1i1p1f1'
│               │           Dimensions without coordinates: vertices, d2
│               │           Data variables:
│               │               no3        (time, nlat, nlon) float32 nan nan nan nan ... nan nan nan nan
│               │               thetao     (time, nlat, nlon) float32 nan nan nan nan ... nan nan nan nan
│               │           Attributes: (12/45)
│               │               variant_label:           r1i1p1f1
│               │               mip_era:                 CMIP6
│               │               license:                 CMIP6 model data produced by <The National Cente...
│               │               contact:                 [email protected]
│               │               parent_variant_label:    r1i1p1f1
│               │               source_type:             AOGCM BGC CHEM AER
│               │               ...                      ...
│               │               case_id:                 4
│               │               branch_time_in_child:    674885.0
│               │               source:                  CESM2 (2017): atmosphere: CAM6 (0.9x1.25 finite ...
│               │               initialization_index:    1
│               │               further_info_url:        https://furtherinfo.es-doc.org/CMIP6.NCAR.CESM2-...
│               │               intake_esm_dataset_key:  CMIP/NCAR/CESM2-WACCM/historical/Omon/gn
│               ├── DataTree('Amon')
│               │   └── DataTree('gn')
│               │           Dimensions:    (time: 6, lat: 192, lon: 288, nbnd: 2)
│               │           Coordinates:
│               │             * lat        (lat) float64 -90.0 -89.06 -88.12 -87.17 ... 88.12 89.06 90.0
│               │               lat_bnds   (lat, nbnd) float64 -90.0 -89.53 -89.53 ... 89.53 89.53 90.0
│               │             * lon        (lon) float64 0.0 1.25 2.5 3.75 5.0 ... 355.0 356.2 357.5 358.8
│               │               lon_bnds   (lon, nbnd) float64 -0.625 0.625 0.625 ... 358.1 358.1 359.4
│               │               plev       float64 1e+05
│               │             * time       (time) object 1850-01-15 12:00:00 ... 1850-06-15 00:00:00
│               │               time_bnds  (time, nbnd) object 1850-01-01 00:00:00 ... 1850-07-01 00:00:00
│               │               member_id  <U8 'r1i1p1f1'
│               │           Dimensions without coordinates: nbnd
│               │           Data variables:
│               │               co2        (time, lat, lon) float32 nan nan nan ... 0.0002868 0.0002868
│               │               pr         (time, lat, lon) float32 2.706e-06 2.706e-06 ... 4.324e-06
│               │           Attributes: (12/46)
│               │               variant_label:           r1i1p1f1
│               │               mip_era:                 CMIP6
│               │               license:                 CMIP6 model data produced by <The National Cente...
│               │               contact:                 [email protected]
│               │               parent_variant_label:    r1i1p1f1
│               │               source_type:             AOGCM BGC CHEM AER
│               │               ...                      ...
│               │               case_id:                 4
│               │               branch_time_in_child:    674885.0
│               │               source:                  CESM2 (2017): atmosphere: CAM6 (0.9x1.25 finite ...
│               │               initialization_index:    1
│               │               further_info_url:        https://furtherinfo.es-doc.org/CMIP6.NCAR.CESM2-...
│               │               intake_esm_dataset_key:  CMIP/NCAR/CESM2-WACCM/historical/Amon/gn
│               └── DataTree('Lmon')
│                   └── DataTree('gn')
│                           Dimensions:    (time: 6, lat: 192, lon: 288, hist_interval: 2)
│                           Coordinates:
│                             * lat        (lat) float64 -90.0 -89.06 -88.12 -87.17 ... 88.12 89.06 90.0
│                               lat_bnds   (lat, hist_interval) float32 -90.0 -89.53 -89.53 ... 89.53 90.0
│                             * lon        (lon) float64 0.0 1.25 2.5 3.75 5.0 ... 355.0 356.2 357.5 358.8
│                               lon_bnds   (lon, hist_interval) float32 -0.625 0.625 0.625 ... 358.1 359.4
│                             * time       (time) object 1850-01-15 11:45:00.000013 ... 1850-06-15 00:00:00
│                               time_bnds  (time, hist_interval) object 1849-12-31 23:29:59.999987 ... 18...
│                               member_id  <U8 'r1i1p1f1'
│                           Dimensions without coordinates: hist_interval
│                           Data variables:
│                               gpp        (time, lat, lon) float32 0.0 0.0 0.0 0.0 0.0 ... nan nan nan nan
│                               mrso       (time, lat, lon) float32 nan nan nan nan nan ... nan nan nan nan
│                           Attributes: (12/46)
│                               variant_label:           r1i1p1f1
│                               mip_era:                 CMIP6
│                               license:                 CMIP6 model data produced by <The National Cente...
│                               contact:                 [email protected]
│                               parent_variant_label:    r1i1p1f1
│                               source_type:             AOGCM BGC CHEM AER
│                               ...                      ...
│                               case_id:                 4
│                               branch_time_in_child:    674885.0
│                               source:                  CESM2 (2017): atmosphere: CAM6 (0.9x1.25 finite ...
│                               initialization_index:    1
│                               further_info_url:        https://furtherinfo.es-doc.org/CMIP6.NCAR.CESM2-...
│                               intake_esm_dataset_key:  CMIP/NCAR/CESM2-WACCM/historical/Lmon/gn
└── DataTree('ScenarioMIP')
    ├── DataTree('MIROC')
    │   └── DataTree('MIROC6')
    │       └── DataTree('ssp370')
    │           ├── DataTree('Omon')
    │           │   └── DataTree('gn')
    │           │           Dimensions:             (y: 256, x: 360, time: 6, bnds: 2, vertices: 4)
    │           │           Coordinates: (12/13)
    │           │               latitude            (y, x) float32 -88.0 -88.0 -88.0 ... 64.43 64.0 63.56
    │           │               lev                 float64 1.0
    │           │               lev_bnds            (bnds) float64 0.0 2.0
    │           │               longitude           (y, x) float32 60.5 61.5 62.5 63.5 ... 59.96 59.98 59.99
    │           │               sigma_bnds          (bnds) float64 -0.0 -0.04
    │           │             * time                (time) datetime64[ns] 2015-01-16T12:00:00 ... 2015-06-16
    │           │               ...                  ...
    │           │             * x                   (x) float64 0.5 1.5 2.5 3.5 ... 356.5 357.5 358.5 359.5
    │           │               x_bnds              (x, bnds) float64 0.0 1.0 1.0 2.0 ... 359.0 359.0 360.0
    │           │             * y                   (y) float64 -88.0 -85.75 -85.25 ... 148.6 150.5 152.4
    │           │               y_bnds              (y, bnds) float64 -90.0 -86.0 -86.0 ... 151.5 153.3
    │           │               zlev_bnds           (bnds) float64 -0.0 -2.0
    │           │               member_id           <U8 'r1i1p1f1'
    │           │           Dimensions without coordinates: bnds, vertices
    │           │           Data variables:
    │           │               depth               (y, x) float32 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0
    │           │               depth_c             float64 50.0
    │           │               eta                 (time, y, x) float32 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0
    │           │               nsigma              int32 10
    │           │               sigma               float64 -0.02
    │           │               thetao              (time, y, x) float32 nan nan nan nan ... nan nan nan nan
    │           │               vertices_latitude   (y, x, vertices) float32 -90.0 -90.0 ... 63.33 63.78
    │           │               vertices_longitude  (y, x, vertices) float32 60.0 61.0 61.0 ... 60.0 60.0
    │           │               zlev                float64 -1.0
    │           │           Attributes: (12/48)
    │           │               Conventions:             CF-1.7 CMIP-6.2
    │           │               activity_id:             ScenarioMIP AerChemMIP
    │           │               branch_method:           standard
    │           │               branch_time_in_child:    60265.0
    │           │               branch_time_in_parent:   60265.0
    │           │               cmor_version:            3.4.0
    │           │               ...                      ...
    │           │               variable_id:             thetao
    │           │               variant_label:           r1i1p1f1
    │           │               status:                  2019-11-18;created;by [email protected]
    │           │               netcdf_tracking_ids:     hdl:21.14100/99dda520-c9e9-4617-b4ca-0de0a2b9398...
    │           │               version_id:              v20190627
    │           │               intake_esm_dataset_key:  ScenarioMIP/MIROC/MIROC6/ssp370/Omon/gn
    │           ├── DataTree('Amon')
    │           │   └── DataTree('gn')
    │           │           Dimensions:    (lat: 128, bnds: 2, lon: 256, time: 6)
    │           │           Coordinates:
    │           │             * lat        (lat) float64 -88.93 -87.54 -86.14 -84.74 ... 86.14 87.54 88.93
    │           │               lat_bnds   (lat, bnds) float64 -90.0 -88.28 -88.28 ... 88.28 88.28 90.0
    │           │             * lon        (lon) float64 0.0 1.406 2.812 4.219 ... 354.4 355.8 357.2 358.6
    │           │               lon_bnds   (lon, bnds) float64 -0.7031 0.7031 0.7031 ... 357.9 357.9 359.3
    │           │             * time       (time) datetime64[ns] 2015-01-16T12:00:00 ... 2015-06-16
    │           │               time_bnds  (time, bnds) datetime64[ns] 2015-01-01 2015-02-01 ... 2015-07-01
    │           │               member_id  <U8 'r1i1p1f1'
    │           │           Dimensions without coordinates: bnds
    │           │           Data variables:
    │           │               pr         (time, lat, lon) float32 1.137e-06 1.131e-06 ... 7.446e-06
    │           │           Attributes: (12/48)
    │           │               Conventions:             CF-1.7 CMIP-6.2
    │           │               activity_id:             ScenarioMIP AerChemMIP
    │           │               branch_method:           standard
    │           │               branch_time_in_child:    60265.0
    │           │               branch_time_in_parent:   60265.0
    │           │               cmor_version:            3.4.0
    │           │               ...                      ...
    │           │               variable_id:             pr
    │           │               variant_label:           r1i1p1f1
    │           │               status:                  2019-10-25;created;by [email protected]
    │           │               netcdf_tracking_ids:     hdl:21.14100/c23c415d-adca-4e01-8e7c-11617bcfa2bb
    │           │               version_id:              v20190627
    │           │               intake_esm_dataset_key:  ScenarioMIP/MIROC/MIROC6/ssp370/Amon/gn
    │           └── DataTree('Lmon')
    │               └── DataTree('gn')
    │                       Dimensions:    (lat: 128, bnds: 2, lon: 256, time: 6)
    │                       Coordinates:
    │                         * lat        (lat) float64 -88.93 -87.54 -86.14 -84.74 ... 86.14 87.54 88.93
    │                           lat_bnds   (lat, bnds) float64 -90.0 -88.28 -88.28 ... 88.28 88.28 90.0
    │                         * lon        (lon) float64 0.0 1.406 2.812 4.219 ... 354.4 355.8 357.2 358.6
    │                           lon_bnds   (lon, bnds) float64 -0.7031 0.7031 0.7031 ... 357.9 357.9 359.3
    │                         * time       (time) datetime64[ns] 2015-01-16T12:00:00 ... 2015-06-16
    │                           time_bnds  (time, bnds) datetime64[ns] 2015-01-01 2015-02-01 ... 2015-07-01
    │                           member_id  <U8 'r1i1p1f1'
    │                       Dimensions without coordinates: bnds
    │                       Data variables:
    │                           mrso       (time, lat, lon) float32 4.2e+03 4.2e+03 4.2e+03 ... nan nan nan
    │                       Attributes: (12/48)
    │                           Conventions:             CF-1.7 CMIP-6.2
    │                           activity_id:             ScenarioMIP AerChemMIP
    │                           branch_method:           standard
    │                           branch_time_in_child:    60265.0
    │                           branch_time_in_parent:   60265.0
    │                           cmor_version:            3.4.0
    │                           ...                      ...
    │                           variable_id:             mrso
    │                           variant_label:           r1i1p1f1
    │                           status:                  2019-10-29;created;by [email protected]
    │                           netcdf_tracking_ids:     hdl:21.14100/3ba01dc3-ab7e-45d0-882a-66ed2768a642
    │                           version_id:              v20190627
    │                           intake_esm_dataset_key:  ScenarioMIP/MIROC/MIROC6/ssp370/Lmon/gn
    ├── DataTree('NCAR')
    │   └── DataTree('CESM2-WACCM')
    │       └── DataTree('ssp370')
    │           ├── DataTree('Amon')
    │           │   └── DataTree('gn')
    │           │           Dimensions:    (time: 6, lat: 192, lon: 288, nbnd: 2)
    │           │           Coordinates:
    │           │             * lat        (lat) float64 -90.0 -89.06 -88.12 -87.17 ... 88.12 89.06 90.0
    │           │               lat_bnds   (lat, nbnd) float64 -90.0 -89.53 -89.53 ... 89.53 89.53 90.0
    │           │             * lon        (lon) float64 0.0 1.25 2.5 3.75 5.0 ... 355.0 356.2 357.5 358.8
    │           │               lon_bnds   (lon, nbnd) float64 -0.625 0.625 0.625 ... 358.1 358.1 359.4
    │           │               plev       float64 1e+05
    │           │             * time       (time) object 2015-01-15 12:00:00 ... 2015-06-15 00:00:00
    │           │               time_bnds  (time, nbnd) object 2015-01-01 00:00:00 ... 2015-07-01 00:00:00
    │           │               member_id  <U8 'r1i1p1f1'
    │           │           Dimensions without coordinates: nbnd
    │           │           Data variables:
    │           │               co2        (time, lat, lon) float32 nan nan nan ... 0.0004034 0.0004034
    │           │               pr         (time, lat, lon) float32 1.919e-06 1.919e-06 ... 1.043e-05
    │           │           Attributes: (12/45)
    │           │               variant_label:           r1i1p1f1
    │           │               mip_era:                 CMIP6
    │           │               license:                 CMIP6 model data produced by <The National Cente...
    │           │               contact:                 [email protected]
    │           │               parent_variant_label:    r1i1p1f1
    │           │               source_type:             AOGCM BGC CHEM AER
    │           │               ...                      ...
    │           │               case_id:                 969
    │           │               branch_time_in_child:    735110.0
    │           │               source:                  CESM2 (2017): atmosphere: CAM6 (0.9x1.25 finite ...
    │           │               initialization_index:    1
    │           │               further_info_url:        https://furtherinfo.es-doc.org/CMIP6.NCAR.CESM2-...
    │           │               intake_esm_dataset_key:  ScenarioMIP/NCAR/CESM2-WACCM/ssp370/Amon/gn
    │           ├── DataTree('Omon')
    │           │   ├── DataTree('gr')
    │           │   │       Dimensions:    (lat: 180, d2: 2, lon: 360, time: 6)
    │           │   │       Coordinates:
    │           │   │         * lat        (lat) float64 -89.5 -88.5 -87.5 -86.5 ... 86.5 87.5 88.5 89.5
    │           │   │           lat_bnds   (lat, d2) float64 -90.0 -89.0 -89.0 -88.0 ... 88.0 89.0 89.0 90.0
    │           │   │           lev        float64 0.0
    │           │   │           lev_bnds   (d2) float64 0.0 5.0
    │           │   │         * lon        (lon) float64 0.5 1.5 2.5 3.5 4.5 ... 356.5 357.5 358.5 359.5
    │           │   │           lon_bnds   (lon, d2) float64 0.0 1.0 1.0 2.0 2.0 ... 358.0 359.0 359.0 360.0
    │           │   │         * time       (time) object 2015-01-15 13:00:00.000007 ... 2015-06-15 00:00:00
    │           │   │           time_bnds  (time, d2) object 2015-01-01 02:00:00.000003 ... 2015-07-01 00...
    │           │   │           member_id  <U8 'r1i1p1f1'
    │           │   │       Dimensions without coordinates: d2
    │           │   │       Data variables:
    │           │   │           no3        (time, lat, lon) float32 nan nan nan ... 0.004002 0.004001
    │           │   │           thetao     (time, lat, lon) float32 nan nan nan nan ... -1.68 -1.68 -1.68
    │           │   │       Attributes: (12/44)
    │           │   │           variant_label:           r1i1p1f1
    │           │   │           mip_era:                 CMIP6
    │           │   │           license:                 CMIP6 model data produced by <The National Cente...
    │           │   │           contact:                 [email protected]
    │           │   │           parent_variant_label:    r1i1p1f1
    │           │   │           source_type:             AOGCM BGC CHEM AER
    │           │   │           ...                      ...
    │           │   │           case_id:                 969
    │           │   │           branch_time_in_child:    735110.0
    │           │   │           source:                  CESM2 (2017): atmosphere: CAM6 (0.9x1.25 finite ...
    │           │   │           initialization_index:    1
    │           │   │           further_info_url:        https://furtherinfo.es-doc.org/CMIP6.NCAR.CESM2-...
    │           │   │           intake_esm_dataset_key:  ScenarioMIP/NCAR/CESM2-WACCM/ssp370/Omon/gr
    │           │   └── DataTree('gn')
    │           │           Dimensions:    (nlat: 384, nlon: 320, vertices: 4, d2: 2, time: 6)
    │           │           Coordinates:
    │           │               lat        (nlat, nlon) float64 -79.22 -79.22 -79.22 ... 72.2 72.19 72.19
    │           │               lat_bnds   (nlat, nlon, vertices) float32 -79.49 -79.49 ... 72.41 72.41
    │           │               lev        float64 500.0
    │           │               lev_bnds   (d2) float32 0.0 10.0
    │           │               lon        (nlat, nlon) float64 320.6 321.7 322.8 ... 318.9 319.4 319.8
    │           │               lon_bnds   (nlat, nlon, vertices) float32 320.0 321.1 321.1 ... 320.0 319.6
    │           │             * nlat       (nlat) int32 1 2 3 4 5 6 7 8 ... 377 378 379 380 381 382 383 384
    │           │             * nlon       (nlon) int32 1 2 3 4 5 6 7 8 ... 313 314 315 316 317 318 319 320
    │           │             * time       (time) object 2015-01-15 13:00:00.000007 ... 2015-06-15 00:00:00
    │           │               time_bnds  (time, d2) object 2015-01-01 02:00:00.000003 ... 2015-07-01 00...
    │           │               member_id  <U8 'r1i1p1f1'
    │           │           Dimensions without coordinates: vertices, d2
    │           │           Data variables:
    │           │               no3        (time, nlat, nlon) float32 nan nan nan nan ... nan nan nan nan
    │           │               thetao     (time, nlat, nlon) float32 nan nan nan nan ... nan nan nan nan
    │           │           Attributes: (12/44)
    │           │               variant_label:           r1i1p1f1
    │           │               mip_era:                 CMIP6
    │           │               license:                 CMIP6 model data produced by <The National Cente...
    │           │               contact:                 [email protected]
    │           │               parent_variant_label:    r1i1p1f1
    │           │               source_type:             AOGCM BGC CHEM AER
    │           │               ...                      ...
    │           │               case_id:                 969
    │           │               branch_time_in_child:    735110.0
    │           │               source:                  CESM2 (2017): atmosphere: CAM6 (0.9x1.25 finite ...
    │           │               initialization_index:    1
    │           │               further_info_url:        https://furtherinfo.es-doc.org/CMIP6.NCAR.CESM2-...
    │           │               intake_esm_dataset_key:  ScenarioMIP/NCAR/CESM2-WACCM/ssp370/Omon/gn
    │           └── DataTree('Lmon')
    │               └── DataTree('gn')
    │                       Dimensions:    (lat: 192, lon: 288, time: 6, hist_interval: 2)
    │                       Coordinates:
    │                         * lat        (lat) float64 -90.0 -89.06 -88.12 -87.17 ... 88.12 89.06 90.0
    │                         * lon        (lon) float64 0.0 1.25 2.5 3.75 5.0 ... 355.0 356.2 357.5 358.8
    │                         * time       (time) object 2015-01-15 11:45:00 ... 2015-05-15 12:00:00
    │                           member_id  <U8 'r1i1p1f1'
    │                           lat_bnds   (lat, hist_interval) float32 -90.0 -89.53 -89.53 ... 89.53 90.0
    │                           lon_bnds   (lon, hist_interval) float32 -0.625 0.625 0.625 ... 358.1 359.4
    │                           time_bnds  (time, hist_interval) object 2014-12-31 23:29:59.999997 ... 20...
    │                       Dimensions without coordinates: hist_interval
    │                       Data variables:
    │                           gpp        (time, lat, lon) float32 nan nan nan nan nan ... nan nan nan nan
    │                           mrso       (time, lat, lon) float32 nan nan nan nan nan ... nan nan nan nan
    │                       Attributes: (12/45)
    │                           variant_label:           r1i1p1f1
    │                           mip_era:                 CMIP6
    │                           license:                 CMIP6 model data produced by <The National Cente...
    │                           contact:                 [email protected]
    │                           parent_variant_label:    r1i1p1f1
    │                           source_type:             AOGCM BGC CHEM AER
    │                           ...                      ...
    │                           case_id:                 969
    │                           branch_time_in_child:    735110.0
    │                           source:                  CESM2 (2017): atmosphere: CAM6 (0.9x1.25 finite ...
    │                           initialization_index:    1
    │                           further_info_url:        https://furtherinfo.es-doc.org/CMIP6.NCAR.CESM2-...
    │                           intake_esm_dataset_key:  ScenarioMIP/NCAR/CESM2-WACCM/ssp370/Lmon/gn
    └── DataTree('CCCma')
        └── DataTree('CanESM5')
            └── DataTree('ssp370')
                ├── DataTree('Amon')
                │   └── DataTree('gn')
                │           Dimensions:    (lat: 64, bnds: 2, lon: 128, time: 6)
                │           Coordinates:
                │             * lat        (lat) float64 -87.86 -85.1 -82.31 -79.53 ... 82.31 85.1 87.86
                │               lat_bnds   (lat, bnds) float64 -90.0 -86.58 -86.58 ... 86.58 86.58 90.0
                │             * lon        (lon) float64 0.0 2.812 5.625 8.438 ... 348.8 351.6 354.4 357.2
                │               lon_bnds   (lon, bnds) float64 -1.406 1.406 1.406 ... 355.8 355.8 358.6
                │             * time       (time) object 2015-01-16 12:00:00 ... 2015-06-16 00:00:00
                │               time_bnds  (time, bnds) object 2015-01-01 00:00:00 ... 2015-07-01 00:00:00
                │               member_id  <U8 'r1i1p1f1'
                │           Dimensions without coordinates: bnds
                │           Data variables:
                │               pr         (time, lat, lon) float32 2.504e-06 2.678e-06 ... 6.46e-06
                │           Attributes: (12/57)
                │               CCCma_model_hash:            1f91f92cb6d607391f44831504025d32fc44faa1
                │               CCCma_parent_runid:          rc3.1-his01
                │               CCCma_pycmor_hash:           33c30511acc319a98240633965a04ca99c26427e
                │               CCCma_runid:                 rc3.1-s7001
                │               Conventions:                 CF-1.7 CMIP-6.2
                │               YMDH_branch_time_in_child:   2015:01:01:00
                │               ...                          ...
                │               tracking_id:                 hdl:21.14100/8c4a1496-f308-493e-8ecc-a2e253e...
                │               variable_id:                 pr
                │               variant_label:               r1i1p1f1
                │               version:                     v20190429
                │               version_id:                  v20190429
                │               intake_esm_dataset_key:      ScenarioMIP/CCCma/CanESM5/ssp370/Amon/gn
                ├── DataTree('Lmon')
                │   └── DataTree('gn')
                │           Dimensions:    (time: 6, lat: 64, lon: 128, bnds: 2)
                │           Coordinates:
                │             * lat        (lat) float64 -87.86 -85.1 -82.31 -79.53 ... 82.31 85.1 87.86
                │               lat_bnds   (lat, bnds) float64 -90.0 -86.58 -86.58 ... 86.58 86.58 90.0
                │             * lon        (lon) float64 0.0 2.812 5.625 8.438 ... 348.8 351.6 354.4 357.2
                │               lon_bnds   (lon, bnds) float64 -1.406 1.406 1.406 ... 355.8 355.8 358.6
                │             * time       (time) object 2015-01-16 12:00:00 ... 2015-06-16 00:00:00
                │               time_bnds  (time, bnds) object 2015-01-01 00:00:00 ... 2015-07-01 00:00:00
                │               member_id  <U8 'r1i1p1f1'
                │           Dimensions without coordinates: bnds
                │           Data variables:
                │               gpp        (time, lat, lon) float32 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0
                │               mrso       (time, lat, lon) float32 3.76e+03 3.76e+03 3.76e+03 ... 0.0 0.0
                │           Attributes: (12/53)
                │               variant_label:               r1i1p1f1
                │               mip_era:                     CMIP6
                │               license:                     CMIP6 model data produced by The Government ...
                │               contact:                     [email protected]
                │               parent_variant_label:        r1i1p1f1
                │               source_type:                 AOGCM
                │               ...                          ...
                │               realm:                       land
                │               branch_time_in_child:        60225.0
                │               source:                      CanESM5 (2019): \naerosol: interactive\natmo...
                │               initialization_index:        1
                │               further_info_url:            https://furtherinfo.es-doc.org/CMIP6.CCCma.C...
                │               intake_esm_dataset_key:      ScenarioMIP/CCCma/CanESM5/ssp370/Lmon/gn
                └── DataTree('Omon')
                    └── DataTree('gn')
                            Dimensions:             (i: 360, j: 291, bnds: 2, time: 6, vertices: 4)
                            Coordinates:
                              * i                   (i) int32 0 1 2 3 4 5 6 ... 353 354 355 356 357 358 359
                              * j                   (j) int32 0 1 2 3 4 5 6 ... 284 285 286 287 288 289 290
                                latitude            (j, i) float64 -78.39 -78.39 -78.39 ... 50.23 50.01
                                lev                 float64 3.047
                                lev_bnds            (bnds) float64 0.0 6.194
                                longitude           (j, i) float64 73.5 74.5 75.5 76.5 ... 72.95 72.96 72.99
                              * time                (time) object 2015-01-16 12:00:00 ... 2015-06-16 00:0...
                                time_bnds           (time, bnds) object 2015-01-01 00:00:00 ... 2015-07-0...
                                member_id           <U8 'r1i1p1f1'
                            Dimensions without coordinates: bnds, vertices
                            Data variables:
                                no3                 (time, j, i) float32 nan nan nan nan ... nan nan nan nan
                                vertices_latitude   (j, i, vertices) float64 -78.29 -78.49 ... 50.11 50.11
                                vertices_longitude  (j, i, vertices) float64 74.0 74.0 73.0 ... 72.95 73.0
                                thetao              (time, j, i) float32 nan nan nan nan ... nan nan nan nan
                            Attributes: (12/52)
                                variant_label:               r1i1p1f1
                                mip_era:                     CMIP6
                                license:                     CMIP6 model data produced by The Government ...
                                contact:                     [email protected]
                                parent_variant_label:        r1i1p1f1
                                source_type:                 AOGCM
                                ...                          ...
                                physics_index:               1
                                branch_time_in_child:        60225.0
                                source:                      CanESM5 (2019): \naerosol: interactive\natmo...
                                initialization_index:        1
                                further_info_url:            https://furtherinfo.es-doc.org/CMIP6.CCCma.C...
                                intake_esm_dataset_key:      ScenarioMIP/CCCma/CanESM5/ssp370/Omon/gn

let me know if everything looks good

andersy005 avatar Aug 08 '22 22:08 andersy005

Thank you so much for doing this @andersy005 , but I think we might be on slightly different pages with what I'm looking for. :sweat_smile:

What I ideally want is the simplest possible datatree that I can still do non-trivial operations on, but which still has some obvious physical interpretation that doesn't require extra thought for the person reading the documentation (who may not work in geoscience!).

If you look at the existing airtemps tutorial dataset we use in xarray, you can see it's fairly minimal and understandable.

<xarray.Dataset>
Dimensions:  (lat: 25, time: 2920, lon: 53)
Coordinates:
  * lat      (lat) float32 75.0 72.5 70.0 67.5 65.0 ... 25.0 22.5 20.0 17.5 15.0
  * lon      (lon) float32 200.0 202.5 205.0 207.5 ... 322.5 325.0 327.5 330.0
  * time     (time) datetime64[ns] 2013-01-01 ... 2014-12-31T18:00:00
Data variables:
    air      (time, lat, lon) float32 ...
Attributes:
    Conventions:  COARDS
    title:        4x daily NMC reanalysis (1948)
    description:  Data is from NMC initialized reanalysis\n(4x/day).  These a...
    platform:     Model
    references:   http://www.esrl.noaa.gov/psd/data/gridded/data.ncep.reanaly...

It's obvious what lat, lon, and time are, and that it contains air temperature data (it would be even clearer if air was renamed to temp but I digress...). There are no other dimensions or coordinates, and the list of attributes isn't too excessive.

The first dataset you showed is perhaps closest to this - it has two distinct types of data that lie on different grids for a good reason (i.e. ocean and atmosphere data). It also has historical data vs a projection, and at least some of the variable names are clear (i.e. O2).

What does smbb mean vs cmip6 in this context?

The CMIP6 version which includes multi models, multi experiments should suffice

However I do also like this, because it gives a motivation for cross-node operations (such as comparing the results of two models).

that's just CESM naming convention which isn't CF-compliant. we can exclude this dataset...

Being CF-compliant isn't really the problem, it's that we want names that actually mean something to datatree users who are from unrelated fields of science. In fact we want to ensure that none of the documentation examples rely on cf-xarray for interpreting anything.

Thank you for sharing the notebook you used to create the data. I think instead of a back-and-forth the easiest way to proceed might be for me to mess with what you've already given me (which is great - I wouldn't even have known where to look!), then I'll put it in xarray-data. At that point we can merge this PR but just point it to that data. How does that sound?

TomNicholas avatar Aug 08 '22 23:08 TomNicholas

Being CF-compliant isn't really the problem, it's that we want names that actually mean something to datatree users who are from unrelated fields of science. In fact we want to ensure that none of the documentation examples rely on cf-xarray for interpreting anything.

I concur that understanding what some of these characteristics mean would require being familiar with the sample dataset in question. However, In my opinion, in addition to domain agnostic datasets, domain-specific datasets are valuable because

  • some of the resulting datatree hierarchy is influenced by vocabulary and other domain-specific characteristics
  • it might make it easier for folks to map their use cases to the datatree model.

Perhaps the more the merrier? The documentation doesn't have to use all these sample datasets (having an archive of different/diverse datasets could come in handy).

What does smbb mean vs cmip6 in this context?

these are the forcing variants used in the CESM Large Ensemble simulations (e.g. smbb: Smoothed Biomass Burning). there's more explanation here: https://ncar.github.io/cesm2-le-aws/model_documentation.html

Thank you for sharing the notebook you used to create the data. I think instead of a back-and-forth the easiest way to proceed might be for me to mess with what you've already given me (which is great - I wouldn't even have known where to look!), then I'll put it in xarray-data. At that point we can merge this PR but just point it to that data.

You bet. This sounds good to me. Ping me if you need my input

andersy005 avatar Aug 09 '22 00:08 andersy005

Perhaps the more the merrier? The documentation doesn't have to use all these sample datasets (having an archive of different/diverse datasets could come in handy).

Oh yes definitely! That's a good point - we could just merge these two datasets into xarray-data and have them as options for tutorial.open_datatree even if I later use a simplified version for some documentation examples.

TomNicholas avatar Aug 09 '22 01:08 TomNicholas