xcdat icon indicating copy to clipboard operation
xcdat copied to clipboard

Replace OPeNDAP datasets with Xarray tutorial datasets in docs

Open tomvothecoder opened this issue 1 year ago • 1 comments

Description

  • Closes #277
  • Closes #675

Checklist

  • [ ] My code follows the style guidelines of this project
  • [ ] I have performed a self-review of my own code
  • [ ] My changes generate no new warnings
  • [ ] Any dependent changes have been merged and published in downstream modules

If applicable:

  • [ ] I have added tests that prove my fix is effective or that my feature works
  • [ ] New and existing unit tests pass with my changes (locally and CI/CD build)
  • [ ] I have commented my code, particularly in hard-to-understand areas
  • [ ] I have made corresponding changes to the documentation
  • [ ] I have noted that this is a breaking change for a major release (fix or feature that would cause existing functionality to not work as expected)

tomvothecoder avatar Oct 03 '24 18:10 tomvothecoder

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 100.00%. Comparing base (c52b5a7) to head (b8b200a). Report is 1 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff            @@
##              main      #705   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files           15        16    +1     
  Lines         1621      1658   +37     
=========================================
+ Hits          1621      1658   +37     

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

codecov[bot] avatar Oct 03 '24 18:10 codecov[bot]

For some of these examples, we probably need to host some ESGF datasets in a xcdat-data repo, similar to https://github.com/pydata/xarray-data. The datasets at xarray-data are subsetted on lat/lon, which means I can't plot a global color map. Plots are looking weird and generating dummy datasets in-memory is not that simple (e.g., getting realistic tas data in a numpy array).

The added benefit of this approach is that we can use real-world datasets and it can help standardize our approach to testing.

tomvothecoder avatar Mar 13 '25 18:03 tomvothecoder

My proposed solution

  • [x] 1. Get the list of datasets used in the notebooks -- figure out which ones overlap between notebooks.

# Gentle Introduction
* "https://esgf-data1.llnl.gov/thredds/dodsC/css03_data/CMIP6/CMIP/CSIRO/ACCESS-ESM1-5/historical/r10i1p1f1/Amon/tas/gn/v20200605/tas_Amon_ACCESS-ESM1-5_historical_r10i1p1f1_gn_185001-201412.nc"

# xCDAT utilities
* "https://esgf-data2.llnl.gov/thredds/dodsC/user_pub_work/E3SM/1_0/amip_1850_aeroF/1deg_atm_60-30km_ocean/atmos/180x360/time-series/mon/ens2/v3/TS_187001_189412.nc"
* "https://esgf-data2.llnl.gov/thredds/dodsC/user_pub_work/E3SM/1_0/amip_1850_aeroF/1deg_atm_60-30km_ocean/atmos/180x360/time-series/mon/ens2/v3/TS_189501_191912.nc",

# Spatial Averaging
* "https://esgf-data1.llnl.gov/thredds/dodsC/css03_data/CMIP6/CMIP/CSIRO/ACCESS-ESM1-5/historical/r10i1p1f1/Amon/tas/gn/v20200605/tas_Amon_ACCESS-ESM1-5_historical_r10i1p1f1_gn_185001-201412.nc"
* "https://esgf-data1.llnl.gov/thredds/dodsC/css03_data/CMIP6/CMIP/CSIRO/ACCESS-ESM1-5/historical/r10i1p1f1/Amon/pr/gn/v20200605/pr_Amon_ACCESS-ESM1-5_historical_r10i1p1f1_gn_185001-201412.nc"

# Temporal Averaging
* "https://esgf-data1.llnl.gov/thredds/dodsC/css03_data/CMIP6/CMIP/CSIRO/ACCESS-ESM1-5/historical/r10i1p1f1/Amon/tas/gn/v20200605/tas_Amon_ACCESS-ESM1-5_historical_r10i1p1f1_gn_185001-201412.nc"
* "https://esgf-data1.llnl.gov/thredds/dodsC/css03_data/CMIP6/CMIP/CSIRO/ACCESS-ESM1-5/historical/r10i1p1f1/3hr/tas/gn/v20200605/tas_3hr_ACCESS-ESM1-5_historical_r10i1p1f1_gn_201001010300-201501010000.nc"

# Climatologies and departures
* "http://esgf.nci.org.au/thredds/dodsC/master/CMIP6/CMIP/CSIRO/ACCESS-ESM1-5/historical/r10i1p1f1/Amon/tas/gn/v20200605/tas_Amon_ACCESS-ESM1-5_historical_r10i1p1f1_gn_185001-201412.nc"
# This dataset should not be downloaded. We can subset 
* "http://esgf.nci.org.au/thredds/dodsC/master/CMIP6/CMIP/CSIRO/ACCESS-ESM1-5/historical/r10i1p1f1/3hr/tas/gn/v20200605/tas_3hr_ACCESS-ESM1-5_historical_r10i1p1f1_gn_201001010300-201501010000.nc"

# Horizontal regridding
* "http://aims3.llnl.gov/thredds/dodsC/css03_data/CMIP6/CMIP/CCCma/CanESM5/historical/r13i1p1f1/Amon/tas/gn/v20190429/tas_Amon_CanESM5_historical_r13i1p1f1_gn_185001-201412.nc"
* "http://aims3.llnl.gov/thredds/dodsC/css03_data/CMIP6/CMIP/NOAA-GFDL/GFDL-CM4/abrupt-4xCO2/r1i1p1f1/day/tas/gr2/v20180701/tas_day_GFDL-CM4_abrupt-4xCO2_r1i1p1f1_gr2_00010101-00201231.nc"

# Vertical regridding
* "http://aims3.llnl.gov/thredds/dodsC/css03_data/CMIP6/CMIP/NCAR/CESM2/historical/r1i1p1f1/Omon/so/gn/v20190308/so_Omon_CESM2_historical_r1i1p1f1_gn_185001-201412.nc",
* "http://aims3.llnl.gov/thredds/dodsC/css03_data/CMIP6/CMIP/NCAR/CESM2/historical/r1i1p1f1/Omon/thetao/gn/v20190308/thetao_Omon_CESM2_historical_r1i1p1f1_gn_185001-201412.nc",
* "http://aims3.llnl.gov/thredds/dodsC/css03_data/CMIP6/CMIP/NOAA-GFDL/GFDL-CM4/abrupt-4xCO2/r1i1p1f1/day/tas/gr2/v20180701/tas_day_GFDL-CM4_abrupt-4xCO2_r1i1p1f1_gr2_00010101-00201231.nc"
  • [x] 2. Host those following datasets on xcdat-data -- subsetted on time to minimize size < 100 mb per file (maybe 3-5 years?)
  • [x] 3. Update xc.tutorial.open_dataset() with paths to these files
  • [x] 4. Update Jupyter Notebook examples. -- IN PROGRESS

tomvothecoder avatar Mar 13 '25 18:03 tomvothecoder

@tomvothecoder In my very quick glimpse I don't see any obviously noticeable issues! Notebooks are looking good to me. It's great to leverage xarray's sample datasets so we don't have to maintain our own. Thank you for your work for this PR!

lee1043 avatar Mar 18 '25 18:03 lee1043

@tomvothecoder In my very quick glimpse I don't see any obviously noticeable issues! Notebooks are looking good to me. It's great to leverage xarray's sample datasets so we don't have to maintain our own. Thank you for your work for this PR!

Thanks for the review @lee1043! I actually decided to create xCDAT sample datasets (https://github.com/xCDAT/xcdat-data) which contain the same ESGF datasets but subsetted. This allows us to keep the same examples in the notebook. I found using the xarray sample datasets resulted in more significant changes in the notebook.

tomvothecoder avatar Mar 19 '25 16:03 tomvothecoder

@tomvothecoder if maintaining our own sample dataset is not a huge effort, I am not oppose on that. Thanks a lot!

lee1043 avatar Mar 19 '25 19:03 lee1043