CUPiD Implement regular testing

Ideally we would test key_metrics and make sure that notebooks aren't broken and also update our environments regularly.

Jan 30 '25 16:01 TeaganKing

This could either be a github action or a cron job. Geocat examples has some continuous integration where they conda uninstall packages and then pip install those packages (with newer versions).

Feb 26 '25 21:02 TeaganKing

geoCAT has two github actions: "CI" and "CI Upstream"; the latter installs the conda environment and then runs install-upstream.sh to switch to (potentially newer) versions available via pip

Feb 26 '25 21:02 mnlevy1981

Happy to help with this and/or provide context on some of our GeoCAT workflows and how we've approached testing. It'd be helpful to understand a little more about what you all are hoping to get out of testing though. Maybe Teagan and I can chat about this today.

Absent additional context, I wonder if it might make sense to start off by writing up some initial tests that use data on GLADE and then think about leveraging something like CIRRUS for CI.

Jul 02 '25 19:07 kafitzgerald

Potential low hanging fruit

[ ] see if ADF has any tests we should use for timeseries
[ ] general CUPID code could be tested
[ ] include tests on 'does this notebook run and not fail?'
[ ] pull out some python functions from notebooks and add unit tests for them
[ ] run something by hand when PRs are merged
[ ] add documentation on adding a unit test / best practices
[ ] cron job that runs key_metrics and checks for errors

Potential higher hanging fruit:

[ ] image comparisons
[ ] CIRRUS
[ ] notebook testing capabilities?

Jul 02 '25 20:07 TeaganKing

Side conversation-- it would be worthwhile to pin packages to avoid potential user issues and then infrequently update these (Sam Rabin's suggestion).

Relatedly, it seems that cupid-infrastructure is not finding these packages but pip works fine?

intake-esm
nco
papermill
ploomber=0.22.3
jupyter-book

Jul 02 '25 20:07 TeaganKing

Re: notebook testing - it looks like nbdime might be helpful here. I haven't personally used it, but it's part of the Jupyter project and has been around for a bit now. I may look into this a bit more for some other projects as well. As a first step, even just running notebooks regularly and checking for errors can be really helpful too. This is what e.g. Pythia and some of the GeoCAT galleries are doing.

Another more disruptive approach might be to use something like Jupytext, but I suspect that's not the direction you'd like to go.

Regardless, getting as much code out of notebooks and into scripts / modules so it can be reused / tested is probably wise.

Jul 02 '25 21:07 kafitzgerald

For reproducibility and to help avoid environment related issues, leveraging something like conda-lock could help. This would give you more reproducibility with environments and a way to easily generate environment specification files and reproduce envs.

Jul 02 '25 21:07 kafitzgerald

Also worth noting that CSG has spun up a machine that can be used for github runners and has access to glade. If we can get connected to that, we can run key_metrics and report any errors (if we started using it now, it should catch that the land notebook doesn't run completely; then #286 would fix it)

Sep 20 '25 02:09 mnlevy1981