Implement regular testing
Ideally we would test key_metrics and make sure that notebooks aren't broken and also update our environments regularly.
This could either be a github action or a cron job. Geocat examples has some continuous integration where they conda uninstall packages and then pip install those packages (with newer versions).
geoCAT has two github actions: "CI" and "CI Upstream"; the latter installs the conda environment and then runs install-upstream.sh to switch to (potentially newer) versions available via pip
Happy to help with this and/or provide context on some of our GeoCAT workflows and how we've approached testing. It'd be helpful to understand a little more about what you all are hoping to get out of testing though. Maybe Teagan and I can chat about this today.
Absent additional context, I wonder if it might make sense to start off by writing up some initial tests that use data on GLADE and then think about leveraging something like CIRRUS for CI.
Potential low hanging fruit
- [ ] see if ADF has any tests we should use for timeseries
- [ ] general CUPID code could be tested
- [ ] include tests on 'does this notebook run and not fail?'
- [ ] pull out some python functions from notebooks and add unit tests for them
- [ ] run something by hand when PRs are merged
- [ ] add documentation on adding a unit test / best practices
- [ ] cron job that runs key_metrics and checks for errors
Potential higher hanging fruit:
- [ ] image comparisons
- [ ] CIRRUS
- [ ] notebook testing capabilities?
Side conversation-- it would be worthwhile to pin packages to avoid potential user issues and then infrequently update these (Sam Rabin's suggestion).
Relatedly, it seems that cupid-infrastructure is not finding these packages but pip works fine?
- intake-esm
- nco
- papermill
- ploomber=0.22.3
- jupyter-book
Re: notebook testing - it looks like nbdime might be helpful here. I haven't personally used it, but it's part of the Jupyter project and has been around for a bit now. I may look into this a bit more for some other projects as well. As a first step, even just running notebooks regularly and checking for errors can be really helpful too. This is what e.g. Pythia and some of the GeoCAT galleries are doing.
Another more disruptive approach might be to use something like Jupytext, but I suspect that's not the direction you'd like to go.
Regardless, getting as much code out of notebooks and into scripts / modules so it can be reused / tested is probably wise.
For reproducibility and to help avoid environment related issues, leveraging something like conda-lock could help. This would give you more reproducibility with environments and a way to easily generate environment specification files and reproduce envs.
Also worth noting that CSG has spun up a machine that can be used for github runners and has access to glade. If we can get connected to that, we can run key_metrics and report any errors (if we started using it now, it should catch that the land notebook doesn't run completely; then #286 would fix it)