DART icon indicating copy to clipboard operation
DART copied to clipboard

Update WRF-DART interface to support WRF4

Open braczka opened this issue 1 year ago • 8 comments

Description:

Make WRF-DART scripting compatible with WRF4 as outlined in Issue #661. Also provides updates to documentation in WRF tutorial and main model page, advising of this change, and describing limits to backward compatibility to WRFv3.9 and earlier.

Fixes issue

Fixes #661

Types of changes

  • [X] Bug fix (non-breaking change which fixes an issue)
  • [X] New feature (non-breaking change which adds functionality)
  • [ ] Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • [X] Documentation update

Documentation changes needed?

  • [ ] My change requires a change to the documentation.
  • [X] I have updated the documentation accordingly.

Tests

Performed testing against WRF-DART Tutorial example as outlined in PR #650

Checklist for merging

  • [ ] Updated changelog entry
  • [X] Documentation updated
  • [ ] Update conf.py

Checklist for release

  • [ ] Merge into main
  • [ ] Create release from the main branch with appropriate tag
  • [ ] Delete feature-branch

Testing Datasets

  • [ ] Dataset needed for testing available upon request
  • [ ] Dataset download instructions included
  • [X] No dataset needed

braczka avatar Apr 29 '24 18:04 braczka

nice stuff @braczka

hkershaw-brown avatar Apr 29 '24 18:04 hkershaw-brown

Just remembered -- I still need to make minor updates to the tutorial tar.gz file, and documentation:

cd $BASE_DIR
wget http://www.image.ucar.edu/wrfdart/tutorial/wrf_dart_tutorial_23May2018_v3.tar.gz
tar -xzvf wrf_dart_tutorial_23May2018_v3.tar.gz

The perturbations, surface file and initial conditions were generated from a pre WRF4 version, but we can still use the same data to run the tutorial example. I just need to add a few more lines of documentation so the user is not confused. Will also update README files in the tar.gz file.

braczka avatar Apr 29 '24 20:04 braczka

Just to document our conversation from DART standup today ... my testing of these changes have been limited to the tutorial test case example as outlined in PR https://github.com/NCAR/DART/pull/650, where the results looked similar to the original WRFv3.9 test case. A thorough evaluation of the impact of these changes upon forward operators (i.e. radio occultation, radar etc) has not been done. We may need to rely on our user-base to test these impacts....

braczka avatar Apr 30 '24 17:04 braczka

ok one last comment I promise, do you want to fix #672 in this pull request? (update the wrf/work/input.nml)

hkershaw-brown avatar May 07 '24 19:05 hkershaw-brown

ok one last comment I promise, do you want to fix #672 in this pull request? (update the wrf/work/input.nml)

Yes, I should include #672, and probably #660 as well, to clean up those two issues.

braczka avatar May 08 '24 22:05 braczka

I seem to be getting a new error since the last shutdown of Derecho. I have been getting the following error when executing step ./driver.csh 2017042706 param.csh >& run.out &. The error is related to the adding of perturbations to the wrfinput file as:

host is  dec2323
assim_advance.csh is running in /glade/derecho/scratch/bmraczka/WRFv4.5_Tutorial/rundir
new_advance_model.csh is running in /glade/derecho/scratch/bmraczka/WRFv4.5_Tutorial/rundir
/glade/derecho/scratch/bmraczka/WRFv4.5_Tutorial/rundir
use wrfvar set
stuff var  U
wrf.info is read
1
/glade/derecho/scratch/bmraczka/WRFv4.5_Tutorial/rundir/WRF/wrfbdy_d01_152057_43200_mean
Error! Non-zero status returned from add_bank_perts.ncl. Check /glade/derecho/scratch/bmraczka/WRFv4.5_Tutorial/rundir/advance_temp1/add_perts.err.
warning:_NclOpenFile: Can not open file <wrfvar_output>; file format not supported or file is corrupted
^Mfatal:file (wrf_in) isn't defined
^Mfatal:["Execute.c":8637]:Execute: Error occurred at or near line 55 in file /glade/derecho/scratch/bmraczka/WRFv4.5_Tutorial/rundir/advance_temp1/add_bank_perts.ncl

^Mduration = 13

The source of the issue seems to be that two calls are made to assim_advance.csh for ensemble member 1, diagnosed by two log files that are created:

-rw-r--r-- 1 bmraczka ncar 858 May  9 16:22 assim_advance_1.o4413480
-rw-r--r-- 1 bmraczka ncar 858 May  9 16:22 assim_advance_1.o4413479

Scripts operating simultaneously seem to compete for access to wrfvar_output file leading to error. Solution seems to be adding some sleep commands within driver.csh script, so script can detect that the assim_advance.csh has been started before generating a second. Will include this commit within this PR.

braczka avatar May 10 '24 18:05 braczka

Brett, could this be due to the CPU binding issue we experienced in WRF-Hydro? If you would like to test our fix, here is the environment command: export PALS_CPU_BIND=none (or setenv in csh)

You can add this in your submission script right after the PBS preamble.

mgharamti avatar May 10 '24 18:05 mgharamti

Brett, could this be due to the CPU binding issue we experienced in WRF-Hydro? If you would like to test our fix, here is the environment command: export PALS_CPU_BIND=none (or setenv in csh)

You can add this in your submission script right after the PBS preamble.

Hmmm -- not sure, but I will test that too !

braczka avatar May 10 '24 18:05 braczka

@hkershaw-brown I think I have addressed all of your concerns. Some additional changes we discussed at standup are not optimal (adding scripting pauses to csh scripts). A more robust fix likely would require more substantial refactor. I will meet with WRF users from EOL and RAL to better address need for refactor. However, this PR fix for hybrid coordinate system and T to THM switch are important to get to community.

braczka avatar May 15 '24 16:05 braczka

Brett, could this be due to the CPU binding issue we experienced in WRF-Hydro? If you would like to test our fix, here is the environment command: export PALS_CPU_BIND=none (or setenv in csh) You can add this in your submission script right after the PBS preamble.

Hmmm -- not sure, but I will test that too !

@mgharamti This PALS_CPU_BIND variable did not seem to influence the WRF simulation. I was also going to test this for slow performance with CLM-DART, but that appears to be related to compression of large data files related to campaign migration.

braczka avatar May 15 '24 16:05 braczka