global-workflow
global-workflow copied to clipboard
The gfsgenesis task failed on Jet
What is wrong?
The error message:
+ Unknown[43]: [[ -d /gpfs/hps ]]
+ Unknown[48]: [[ -L /usrx ]]
+ Unknown[53]: [[ -d /scratch2 ]]
+ Unknown[58]: [[ -d /work ]]
+ Unknown[63]: [[ -d /lfs3 ]]
+ Unknown[68]: [[ -d /lfs/h1 ]]
+ Unknown[74]: machine=unknown
+ Unknown[74]: export machine
+ Unknown[75]: echo Job failed: unknown platform
+ Unknown[75]: 1>& 2
Job failed: unknown platform
+ Unknown[76]: err_exit 'FAILED genesis.1116443 - ERROR IN unknown platform - ABNORMAL EXIT'
-------------------------------------------------------------
-- FATAL ERROR: FAILED genesis.1116443 - ERROR IN unknown platform - ABNORMAL EXIT
-- ABNORMAL EXIT at Sat May 25 04:16:36 UTC 2024 on k3
-------------------------------------------------------------
What should have happened?
The gfsgenesis task should complete successfully
What machines are impacted?
Jet
Steps to reproduce
Run a forecast-only experiment on Jet will reproduce the error.
Additional information
The error happens in
/lfs4/HFIP/hfv3gfs/glopara/git/TC_tracker/feature-GFSv17_com_reorg/scripts/exgfs_tc_genesis.sh
Where line 128-131 reads as follows:
elif [[ -d /lfs3 ]] ; then
# We are on NOAA Jet
machine=jet
${USHens_tracker}/extrkr_gen_gfs.sh ${loopnum} ${cmodel} ${pert} ${pertdir} #2>&1 >${outfile}
Jet no longer has the /lfs3 directory
Do you have a proposed solution?
update TC_tracker to a more recent version so that it can detect Jet correctly.
Or can we update ens_tracker_ver in run.spack.ver
from
export ens_tracker_ver=feature-GFSv17_com_reorg
to
export ens_tracker_ver=v1.1.15.6
@HananehJafary-NOAA @InnocentSouopgui-NOAA FYI
Please see the following issue: https://github.com/NOAA-EMC/global-workflow/issues/2841 After the failure of /lfs4, libraries on S4 are moving to /lfs5
The following pull request https://github.com/NOAA-EMC/global-workflow/pull/2878 has the fix for pretty much everything. Unfortunately, we can not push it through yet, because of a bug in one one component, that will affect other systems.
Also the TC_Tracker component has not moved yet, but will be moving soon.
If you want try the version with the fix, let me know.
You can also mirror the fix in your local working copy.
For instance, you can replace /lfs3 or /lfs1 by /lsf5 in /lfs4/HFIP/hfv3gfs/glopara/git/TC_tracker/feature-GFSv17_com_reorg/scripts/exgfs_tc_genesis.sh
Proceed with caution if you want to fix your local copy, as there is no guarantee of /lfs4 being mounted on the compute nodes.
I have migrated all the /lfs4 filesystem to /lfs5 for TC_Tracker on Jet. You can clone the updated package under my directory: https://github.com/HananehJafary-NOAA/tracker_package
OBE