CUPiD icon indicating copy to clipboard operation
CUPiD copied to clipboard

CVDP improvements

Open TeaganKing opened this issue 3 months ago • 13 comments

  • [ ] From a discussion with @justin-richling , the run_adf_diag command finishes before CVDP is actually complete because CVDP runs independently from ADF in the background. We will need to figure out how to get cupid-diagnostics to run after CVDP finishes running in the CESM workflow, and it may be helpful if we can make sure CVDP finishes before that command exits. We should check long the CVDP takes to run and whether it would be worth putting it in series with the ADF so it becomes the last thing that runs.

  • [ ] CVDP should be enabled via CUPiD in the CESM workflow.

  • [ ] We should also implement some mechanism for not including link_to_CVDP in webpage output if cvdp_run from link_to_ADF.ipynb parameters is False.

  • [ ] See #298

  • [ ] The links to CVDP/ADF/LDF all seem to be quite fragile with their naming conventions. It would be great to make this a bit more robust. I think the issue is related to start/end years... See #268

TeaganKing avatar Oct 10 '25 21:10 TeaganKing

RE Item 3: The place where the CVDP notebook gets added to the ploomber task list is here. We could add an "if cvdp_run is false, don't add link_to_cvdp.ipynb to all_nbs" for both CVDP and ILAMB. Currently, those notebooks show up even if CVDP or ILAMB is not run, which is a bit of an unfortunate tease for users.

TeaganKing avatar Oct 16 '25 21:10 TeaganKing

Additional items related to CVDP and other externals:

  • [x] The CVDP link currently opens a new tab rather than redirecting the current one. This is really useful so as not to have to navigate back in a browser to get to the CUPiD homepage. We should update LDF and ADF to do the same (or, at least they should all be consistent).

  • [x] The actual links in the notebooks are a bit hidden by all of the other code, etc. Maybe we can increase the font size or bold the link or change the cell color or something.

TeaganKing avatar Oct 28 '25 14:10 TeaganKing

Hey @justin-richling ! I turned on cvdp_true in the adf_config.yml file in CUPiD, and CVDP_output directory appears and has these contents, but does not seem to finish running. Any idea what may be happening here?

  • cvdp.out (which is empty)
  • CVDP_readme.pdf
  • driver.b.e30_alpha07c_cesm.B1850C_LTso.ne30_t232_wgx3.249.ncl
  • driver.ncl
  • namelist
  • namelist_obs
  • ncl_scripts A few examples of this are here: /glade/work/tking/cupid_project/other_cupids/nov17/CUPiD/examples/

Also did CVDP end up being set to run after ADF finishes or start at the same time, or some other timing?

TeaganKing avatar Nov 18 '25 00:11 TeaganKing

Hi @TeaganKing The code in the ADF hasn't changed yet so the CVDP still starts in the middle of the ADF. I'm guessing the CVDP just hadn't gotten to making output yet, did it eventually get plots? I think there might be a problem with this version of the CVDP that it needs lat/lon gridded data, so that would be something to check too if it didn't make output. The cvdp.out is the log so that might be worth exploring too if so. Let me know if there is still no output and I can take a look!

justin-richling avatar Nov 18 '25 16:11 justin-richling

Right, there's still no output-- eg, see here: /glade/work/tking/cupid_project/other_cupids/nov17/CUPiD/examples/247v234/CVDP_output/b.e30_alpha07c_cesm.B1850C_LTso.ne30_t232_wgx3.247/

I think this is a lat/lon file. The cvdp.out file also has nothing in it.

TeaganKing avatar Nov 18 '25 17:11 TeaganKing

Yeah, something weird is happening here, both cases are on the lat/lon grid and the ADF made output, so I'm a little confused why nothing is even being written to cvdp.out...

The only thing that I can think is in the test case b.e30_alpha07c_cesm.B1850C_LTso.ne30_t232_wgx3.247 time series directory that the ADF config file is pointing to has two sets of time series files for each var, one set goes from 1-49, the other 1-48. I wonder if there is an issue about having multiple time series files if the CVDP chokes here?

Tagging @phillips-ad to see if any of this makes sense

justin-richling avatar Nov 18 '25 20:11 justin-richling

Yes Justin you got it. The CVDP will get confused if there's multiple timeseries present in a single directory for the same run and variable. Is it possible to remove the older set (from years 1-48 in this case) of data? There's no reason to have the older set once the newer set gets formed.

phillips-ad avatar Nov 18 '25 20:11 phillips-ad

Thanks for chiming in so fast! This is good to know for debugging in the future. Just curious, would this behave the same in 6.1?

justin-richling avatar Nov 18 '25 20:11 justin-richling

Yes, the behavior would be the exact same. This has been the case since the first version of the CVDP. One could easily say I should have coded it better in this aspect, and I wouldn't disagree.

phillips-ad avatar Nov 18 '25 20:11 phillips-ad

Looking at /glade/work/tking/cupid_project/other_cupids/nov17/CUPiD/examples/247v234/CVDP_output/b.e30_alpha07c_cesm.B1850C_LTso.ne30_t232_wgx3.247/ the namelist_byvar directory hasn't been created, which says to me that the package likely hasn't started as that is the first step.

phillips-ad avatar Nov 18 '25 20:11 phillips-ad

Ok, just to be certain this was probably due to the case having multiple time series, correct?

@TeaganKing I modified the CVDP call in the ADF so it will always run after time series creation (this is how it should be, this is currently a bug), I just need to bring those changes into CUPiD, but it will still silently run in the background. So the two options here are:

  1. keep it silently in the background and the user just has to essentially check the CVDP_output/output/ for the index.html file which should be the last file created signaling the CVDP has finished or

  2. We can modify the ADF further to not have the CVDP in the background and place the CVDP call in series to be the last item run with the rest of the ADF.

I guess I would lean towards 1) just to keep from changing the main code of the ADF, this wouldn't be bad, just would require some more testing on my end. 2) would likely be slower since the CVDP currently runs in parallel, not in series with the ADF.

justin-richling avatar Nov 18 '25 20:11 justin-richling

Hi @justin-richling and @phillips-ad , thanks for chiming in here. I ran CVDP again after removing the extra time series files (and making sure the config file points to the existing ones), and still ended up with an essentially empty CVDP_output directory.

My main concern with 1) is just that when automating through the CESM workflow, this means that CVDP may not be done running before CVDP.ipynb is run with cupid-diagnostics & cupid-webpage-- so the CVDP-generated plots may not make it in to the jupyterbook.

TeaganKing avatar Nov 19 '25 14:11 TeaganKing

@TeaganKing, could CVDP.ipynb be set to run only after the index.html file is present in the CVDP_output/output directory?

phillips-ad avatar Nov 19 '25 18:11 phillips-ad