global-workflow icon indicating copy to clipboard operation
global-workflow copied to clipboard

add capability to dynamically generate GSI *info files

Open jswhit2 opened this issue 8 months ago • 6 comments

Description

Currently config.anal has an if/then/else loop that sets env vars SATINFO, OZINFO, CONVINFO that contain paths to different gsi info files for different periods over the last several years. For reanalysis, we need a solution that works back to 1979. Rather than add to the if/then/else block in config.anal, I have created a set of scripts that generate *info files dynamically given a date (https://github.com/NOAA-PSL/build_gsinfo). What is needed is a way to use those scripts in global-workflow. Note this is only needed for GSI - for JEDI the needed functionality is included in observation_chronicle (https://github.com/NOAA-EMC/jcb-gdas/tree/develop/observation_chronicle/atmosphere).

Resolves https://github.com/NOAA-EMC/global-workflow/issues/3293

Requires https://github.com/NOAA-EMC/GSI-fix/pull/28

Enabled via USE_BUILD_GSINFO env var that can be set to YES in config.base (default is NO). 3 new scripts added (create_satinfo.sh, create_ozinfo.sh, create_convinfo.sh). These scripts generate the GSI *info for a given analysis date using data from build_gsinfo (which will live inside GSI-fix once https://github.com/NOAA-EMC/GSI-fix/pull/28 is merged).

The OBS_INPUT table in the GSI namelist is removed from exglobal_atmos_analysis.sh to allow for separate options for NCEP ops and reanalysis. Both versions are included as text files in build_gsinfo/obs_input. The OBS_INPUT env var can be used to choose which version do use. The NCEP ops version is the default in config.anal.

A workaround for https://github.com/NOAA-EMC/GSI/issues/752 is included in exglobal_atmos_analysis.sh (pointing to a separate directory for HIRS coefficient files). This hack can be removed once https://github.com/NOAA-EMC/GSI/issues/783 is merged.

Type of change

  • [ ] Bug fix (fixes something broken)
  • [x] New feature (adds functionality)
  • [ ] Maintenance (code refactor, clean-up, new CI test, etc.)

Change characteristics

  • Is this a breaking change (a change in existing functionality)?NO

  • Does this change require a documentation update? YES

  • Does this change require an update to any of the following submodules? YES

    • [ x] GSI-fix https://github.com/NOAA-EMC/GSI-fix/pull/28

How has this been tested?

  • Clone and build on gaeac6 and hera
  • Cycled test on gaeac6 and hera

Checklist

  • [ ] Any dependent changes have been merged and published
  • [ x] My code follows the style guidelines of this project
  • [ x] I have performed a self-review of my own code
  • [x ] I have commented my code, particularly in hard-to-understand areas
  • [ x] I have documented my code, including function, input, and output descriptions
  • [ x] My changes generate no new warnings
  • [ x] New and existing tests pass with my changes
  • [ ] This change is covered by an existing CI test or a new one has been added
  • [ ] Any new scripts have been added to the .github/CODEOWNERS file with owners
  • [ ] I have made corresponding changes to the system documentation if necessary

jswhit2 avatar Mar 18 '25 18:03 jswhit2

@jack-woollen @jswhit2 Jeff I didn't figure out how to update your branch with the convinfo file containing complete satwnd definitions. Instead you can copy from /work2/noaa/da/jwoollen/RAEXPS/scripts/2021ozn1/build_gsinfo/convinfo/merged_convinfo.txt.

jswhit2 avatar Mar 18 '25 18:03 jswhit2

@ClaraDraper-NOAA I think you forgot to include the link. Would you be able to tell us specifically which lines to add to the. reanalysis version of convinfo (https://github.com/NOAA-PSL/build_gsinfo/blob/main/convinfo/merged_convinfo.txt)? The anavinfo changes won't be needed for the scout runs until we start running ensemble DA.

jswhit avatar Apr 03 '25 18:04 jswhit

@jswhit2 To include the soil analysis in the reanalysis / scout runs, we'll need to make changes to the convinfo and anavinfo. For the NRT system, those changes are here.

Can you please add these changes into your [conv/anav]info files? I'm not sure if it's better to include them always, or as an option.

I did forget the link! It's here

ClaraDraper-NOAA avatar Apr 03 '25 19:04 ClaraDraper-NOAA

If we want to assimilate 2m obs into the scout run (maybe not a great idea???), to see the changes you need, in fix/gsi/

diff global_convinfo_2mObs.txt global_convinfo.txt

If we just want to monitor them, you can leave convinfo as is. For monitoring or assim, we need to add t2m and q2m to the met_guess and state_derivative namelists in anavinfo. There's an example here on hera:

/scratch2/BMC/gsienkf/Clara.Draper/gerrit-hera/global-workflow_CNTRL/fix/global_anavinfo_2mDiag.l127.txt

ClaraDraper-NOAA avatar Apr 03 '25 19:04 ClaraDraper-NOAA

OK got it - thanks @ClaraDraper-NOAA

jswhit avatar Apr 03 '25 19:04 jswhit

Moving to draft while upstream issues are addressed. @jswhit feel free to re-mark this as ready for review when it is so.

DavidHuber-NOAA avatar Apr 10 '25 12:04 DavidHuber-NOAA

A few minor nit-picks. There are also a number of shell-check issues noted here: https://github.com/NOAA-EMC/global-workflow/actions/runs/14601534590/job/40960623756?pr=3472. Could you please also address these?

@DavidHuber-NOAA those shell-check errors do not appear to coming from lines I modified so I hesitate to change them.

jswhit2 avatar May 23 '25 16:05 jswhit2

@jswhit2 : GSI-fix PR #28 has been merged into GSI-fix develop. Done at 35c06c0

RussTreadon-NOAA avatar May 27 '25 18:05 RussTreadon-NOAA

A few minor nit-picks. There are also a number of shell-check issues noted here: https://github.com/NOAA-EMC/global-workflow/actions/runs/14601534590/job/40960623756?pr=3472. Could you please also address these?

@DavidHuber-NOAA those shell-check errors do not appear to coming from lines I modified so I hesitate to change them.

@jswhit2 I have addressed the shellcheck issues in exglobal_atmos_analysis.sh and exglobal_diag.sh.

DavidHuber-NOAA avatar Jun 02 '25 15:06 DavidHuber-NOAA

To reproduce the gfsv17_historical GSI *info files, the build_gsinfo dir in GSI-fix will need to be updated to match https://github.com/NOAA-PSL/build_gsinfo-fix/pull/3. It will not reproduce them bit for bit, since I've turned off some instruments that no longer exist, but are still turned on in satinfo (like amsua_metop-a and amsua_aqua).

jswhit avatar Sep 17 '25 20:09 jswhit

I think all the requested changes have been made and this PR is ready to go once @RussTreadon-NOAA and @ClaraDraper-NOAA finish their reviews.

jswhit avatar Sep 29 '25 16:09 jswhit

@DavidHuber-NOAA curious about why you made this change. Seems like the correct place to look is ${FIXgsi}/build_gsinfo.

jswhit2 avatar Sep 30 '25 15:09 jswhit2

@jswhit that change goes along with this one: https://github.com/jswhit2/global-workflow/pull/2/commits/8167654653d5f97a8bb625840208e87badae4f66. The latter creates links from the build_gsinfo contents (satinfo, ozinfo, etc) to parm/gsinfo (I made some additional commits after this to correct some bugs in the linking; you can see the full, correct list here).

The reasons for this change set is that we can link directly to the gsinfo files in the build_gsinfo submodule and thus not have to stage new GSI fix file sets on all of the platforms. Since the build_gsinfo submodule is version controlled, I wanted to avoid staging it.

DavidHuber-NOAA avatar Sep 30 '25 17:09 DavidHuber-NOAA

@jswhit2 @ClaraDraper-NOAA I ran a test case that generates the 2m observation info files along with the other GSI *info files for a C96 analysis and for cycle 2025090100. The run directory for the analysis can be found here: /scratch4/NCEPDEV/stmp/David.Huber/RUNDIRS/gsinfo/gdas.2025090100/anal.2297506

Can you take a look at this directory and verify that everything looks as it should (and/or ping others to take a look)? I would like to start the process of merging upstream submodule PRs so we can move this PR forward.

The expdir for the experiment can be found here: /scratch3/NCEPDEV/stmp/David.Huber/para_gsinfo/expdir/gsinfo

It is worth noting that the gdas_analdiag job failed for this case. I have not investigated the cause of the failure yet, but it suggests that the diag jobs will need to be reworked some to support dynamic GSI *info files. I'm not sure if this needs to happen for this PR or if this can happen in a follow-up. What do you think?

DavidHuber-NOAA avatar Sep 30 '25 19:09 DavidHuber-NOAA

It is worth noting that the gdas_analdiag job failed for this case. I have not investigated the cause of the failure yet, but it suggests that the diag jobs will need to be reworked some to support dynamic GSI *info files. I'm not sure if this needs to happen for this PR or if this can happen in a follow-up. What do you think?

I don't think we can let this get merged in if it breaks analdiag

CoryMartin-NOAA avatar Sep 30 '25 20:09 CoryMartin-NOAA

@CoryMartin-NOAA just clarifying that this only breaks ‘analdiag’ if the ‘build_gsinfo’ is enabled, which it is not by default. I’m fine with continuing to work on this issue if that’s still a blocking issue.

DavidHuber-NOAA avatar Sep 30 '25 21:09 DavidHuber-NOAA

@DavidHuber-NOAA got it, I misunderstood. As long as it passes with the default/old configuration, that's fine with me

CoryMartin-NOAA avatar Sep 30 '25 21:09 CoryMartin-NOAA

@jswhit that change goes along with this one: jswhit2@8167654. The latter creates links from the build_gsinfo contents (satinfo, ozinfo, etc) to parm/gsinfo (I made some additional commits after this to correct some bugs in the linking; you can see the full, correct list here).

The reasons for this change set is that we can link directly to the gsinfo files in the build_gsinfo submodule and thus not have to stage new GSI fix file sets on all of the platforms. Since the build_gsinfo submodule is version controlled, I wanted to avoid staging it.

OK that makes sense. Thanks @DavidHuber-NOAA

jswhit avatar Sep 30 '25 21:09 jswhit

@jswhit2 @ClaraDraper-NOAA I ran a test case that generates the 2m observation info files along with the other GSI *info files for a C96 analysis and for cycle 2025090100. The run directory for the analysis can be found here: /scratch4/NCEPDEV/stmp/David.Huber/RUNDIRS/gsinfo/gdas.2025090100/anal.2297506

Can you take a look at this directory and verify that everything looks as it should (and/or ping others to take a look)? I would like to start the process of merging upstream submodule PRs so we can move this PR forward.

The expdir for the experiment can be found here: /scratch3/NCEPDEV/stmp/David.Huber/para_gsinfo/expdir/gsinfo

It is worth noting that the gdas_analdiag job failed for this case. I have not investigated the cause of the failure yet, but it suggests that the diag jobs will need to be reworked some to support dynamic GSI *info files. I'm not sure if this needs to happen for this PR or if this can happen in a follow-up. What do you think?

Unfortunately, I don't have RDHPCS access yet (after my recent re-hire) so I can't see those files. I do still have access to orion, so if you can copy the files there (including the log from the failed analdiag step) I can take a look.

jswhit avatar Sep 30 '25 21:09 jswhit

@jswhit2 sure, I've copied over the expdir, comroot, and anal.2297506 directories to Orion here: /work2/noaa/global/dhuber/gsinfo.

DavidHuber-NOAA avatar Oct 01 '25 13:10 DavidHuber-NOAA

the analdiag step fails because a diag_tcp file is not created. This should be fixed by https://github.com/jswhit2/global-workflow/commit/c3518bb392a4ca73c24c156fa54d821dc35decfd

jswhit2 avatar Oct 01 '25 21:10 jswhit2

Checked the satinfo, ozinfo and convinfo files in @DavidHuber-NOAA's test and compared to the files in gfsv17_historical. Aside from one error in avhrr3_metop-c (which is now fixed in build_gsinfo-fix the files are functionally equivalent (but not identical). The only differences are for instruments that no longer exist, or are turned off.

jswhit2 avatar Oct 01 '25 21:10 jswhit2

@jswhit thanks for the fixes! That did get the gdas_analdiag job to run and the rest of the cycle ran successfully as well.

DavidHuber-NOAA avatar Oct 02 '25 14:10 DavidHuber-NOAA

@jswhit2 @DavidHuber-NOAA - Is the GSI hash in this branch correct? I'm not able to clone/build:

fatal: Fetched in submodule path 'gsi_enkf.fd', but it did not contain 1460f419c700dadb706c3c85f2cbe71cc280aab6. Direct fetching of that commit failed.

Github is also a bit confused about the hash.

CatherineThomas-NOAA avatar Oct 14 '25 17:10 CatherineThomas-NOAA

@CatherineThomas-NOAA the branch lives in my fork. I will update the .gitmodules file to tell Git/GitHub where to find it.

DavidHuber-NOAA avatar Oct 14 '25 17:10 DavidHuber-NOAA

@CatherineThomas-NOAA It should be clonable now. I had a couple of local commits on Ursa that weren't pushed. I updated the .gitmodules file as well for good measure.

DavidHuber-NOAA avatar Oct 14 '25 17:10 DavidHuber-NOAA

@DavidHuber-NOAA - I was able to clone and run successfully on GaeaC6 after the change. Thanks!

CatherineThomas-NOAA avatar Oct 16 '25 17:10 CatherineThomas-NOAA

@DavidHuber-NOAA @jswhit GSI-fix PR https://github.com/NOAA-EMC/GSI-fix/pull/50 has been successfully merged to its repo. Do you need someone else to update the hash in the GSI as well? Are there other changes in the GSI hash associated with this PR or is it just the GSI-fix hash update?

CatherineThomas-NOAA avatar Oct 31 '25 18:10 CatherineThomas-NOAA

I think it's just the GSI-fix hash in GSI that needs updating. @DavidHuber-NOAA could you update the hash in your fork and issue a PR? Once that is merged I can update the GSI hash here and this PR should be ready to merge.

jswhit avatar Oct 31 '25 18:10 jswhit

The associated GSI PR https://github.com/NOAA-EMC/GSI/pull/945 has been merged. Looks like from here we need to:

  • Update the GSI hash
  • Update the glopara staged GSI-fix directory (with new date)
  • Update gsi fix version date
  • Turn on default behavior to build_gsinfo in gfs/config.base

Did I miss anything?

CatherineThomas-NOAA avatar Nov 03 '25 20:11 CatherineThomas-NOAA