E3SM icon indicating copy to clipboard operation
E3SM copied to clipboard

Throw error in case of duplicate output file request from yaml

Open ekdejong opened this issue 1 month ago • 12 comments

Addresses issue #7014 by checking for a duplicate *.nc file that would be created by a yaml using the same prefix, averaging type, and interval.

Commits have been tested for expected behavior on LC Intel (Dane) with an older version of E3SM.

Fixes #7014

[BFB]

ekdejong avatar Nov 25 '25 19:11 ekdejong

@bartgol , Emily is a PostDoc working with us at LLNL and she was hitting an error that matched this issue from almost a year ago. She came up with a fix so I encouraged her to submit a PR.

AaronDonahue avatar Nov 25 '25 19:11 AaronDonahue

Thanks for the PR @ekdejong . Just update the title to be a "verb noun" sentence about the PR instead of the branch name.

rljacob avatar Nov 25 '25 21:11 rljacob

@ekdejong The v1 tests fail the RUN phase, so something may be amiss with the logic you implemented.


BACKTRACE:
/home/runner/_work/E3SM/E3SM/components/eamxx/src/share/io/eamxx_output_manager.cpp:432
Error! Output file already exists: 
  - filename: ERS.ne4pg2_ne4pg2.F2010-SCREAMv1.ghci-snl-cpu_gnu.eamxx-prod.C.20251125_214248_yvz7p8.scream.1hourlyAVG_native.h.AVERAGE.nhours_x1.1994-10-07-00000.nc
  - timestamp: 1994-10-07-03600
This likely indicates that two yaml files were specified with identical interval, average type, and prefix.

FAILED CONDITION: 'false'

Did you test your fix with restarts? The failing tests are all restart tests. The ERS tests that passed do not produce any eamxx output file...

bartgol avatar Nov 26 '25 03:11 bartgol

you can add a check somewhere around this line to ensure output_yaml_files don't have duplicate entries (you can either do that in a nice or ugly way --- an example of a bad way is output_yaml_files = list(set(output_yaml_files)))

https://github.com/E3SM-Project/E3SM/blob/27dae13296008ddcdfd119428a6b19fc316e8e75/components/eamxx/cime_config/eamxx_buildnml.py#L1148

mahf708 avatar Nov 26 '25 18:11 mahf708

Thanks @mahf708 and @bartgol for pointing out these intricacies -- the restart / overwrite functionality is not something I thought of. I'll start over and follow @mahf708's suggestion...

ekdejong avatar Nov 26 '25 19:11 ekdejong

Tbc, I do think we should put checks in buildnml, but a more verbose runtime error explaining what went wrong is already an improvement. After all, even if the yaml files are correct, it is possible that other bugs cause an output mgr to attempt to open a file that is already open, in which case, a better error msg can help.

That said, if @ekdejong is fine diving in the buildnml part and impl a check there, I'm all for it. Not a must, though.

bartgol avatar Nov 26 '25 22:11 bartgol

@mahf708 and @bartgol I've added an additional check in the eamxx namelist file python setup as suggested by @mahf708 -- you'll have to tell me if it's a "nice or ugly way" ;P

I also added some additional logic to the original output manager hack that should allow for a useful error message but not interfere with restarts. In particular, there is now an option to add "allow_overwrite_existing_files: true" to one's yaml file if the user truly intends to overwrite existing output; else the flag will trigger a meaningful EKAT error, including a tip regarding that yaml option. However, if all of this is too complex, I can remove those changes to the output manager and we can just roll with the namelist python fix for now.

ekdejong avatar Dec 03 '25 23:12 ekdejong

@ekdejong, the python stuff is excellent! Actually better/smarter and more informative than I would have proposed here. Maybe a minor feedback Luca or Jim would likely insist on is avoiding the if block by putting the conditional inside expect (but that's just code style)

Honestly, if I were you, I would abandon the C++ route and let the three IO epicists 🙈🙉🙊 deal with it

mahf708 avatar Dec 04 '25 03:12 mahf708

oh btw, I triggered an AI review out of curiosity what it would say ;) and lol, it is mostly useless (love how it is insisting on uppercase YAML...)

mahf708 avatar Dec 04 '25 03:12 mahf708

@mahf708 @bartgol I've opted to take Naser's advice here -- I've updated the python code addition to use the proper expect syntax, but abandoned the C++ additions. I'll let the "IO epicists" take that one on instead ;)

ekdejong avatar Dec 09 '25 18:12 ekdejong

@ekdejong one last request: rather than have 12 commits, 10 of which are just pairs of "blah" and "Revert blah", can you rewrite the git history and remove those commits, leaving just the two that touch the py code, and then force push? No point in making the git history longer.

bartgol avatar Dec 11 '25 15:12 bartgol

@ekdejong one last request: rather than have 12 commits, 10 of which are just pairs of "blah" and "Revert blah", can you rewrite the git history and remove those commits, leaving just the two that touch the py code, and then force push? No point in making the git history longer.

Aye aye!

ekdejong avatar Dec 11 '25 23:12 ekdejong