CAM
CAM copied to clipboard
Runs with MPAS-A dycore and CAM7 physics fail - missing variables in inic files
What happened?
Runs of the F2000dev compset on MPAS-A grids fail. This seems to be due to the combination of the MPAS-A dycore and CAM7 (a.k.a. cam_dev) physics.
The last output from a case's atm.log:
----- done assigning dimensions from Registry.xml -----
Allocating fields ...
34 MB allocated for fields on this task
4346 MB total allocated for fields across all tasks
----- done allocating fields -----
Last output from cesm.log (reorganized for 1 thread):
dec0360.hsn.de.hpc.ucar.edu 124: forrtl: severe (174): SIGSEGV, segmentation fault occurred
dec0360.hsn.de.hpc.ucar.edu 124: Image PC Routine Line Source
dec0360.hsn.de.hpc.ucar.edu 124: libpthread-2.31.s 000014BDC4E318C0 Unknown Unknown Unknown
dec0360.hsn.de.hpc.ucar.edu 124: cesm.exe 0000000002CAE620 mpas_io_streams_m 1037 mpas_io_streams.F
dec0360.hsn.de.hpc.ucar.edu 124: cesm.exe 0000000002B40B6D cam_mpas_subdrive 1154 cam_mpas_subdriver.F90
dec0360.hsn.de.hpc.ucar.edu 124: cesm.exe 0000000000643D5E dyn_grid_mp_dyn_g 464 dyn_grid.F90
dec0360.hsn.de.hpc.ucar.edu 124: cesm.exe 0000000000592015 cam_comp_mp_cam_i 165 cam_comp.F90
dec0360.hsn.de.hpc.ucar.edu 124: cesm.exe 000000000057ACDD atm_comp_nuopc_mp 635 atm_comp_nuopc.F90
dec0360.hsn.de.hpc.ucar.edu 124: libesmf.so 000014BDCC973B40 _ZN5ESMCI6FTable1 Unknown Unknown
dec0360.hsn.de.hpc.ucar.edu 124: libesmf.so 000014BDCC973607 ESMCI_FTableCallE Unknown Unknown
dec0360.hsn.de.hpc.ucar.edu 124: libesmf.so 000014BDCCC5DF85 _ZN5ESMCI2VM5ente Unknown Unknown
dec0360.hsn.de.hpc.ucar.edu 124: libesmf.so 000014BDCC974351 c_esmc_ftablecall Unknown Unknown
dec0360.hsn.de.hpc.ucar.edu 124: libesmf.so 000014BDCCEEE6E0 esmf_compmod_mp_e Unknown Unknown
dec0360.hsn.de.hpc.ucar.edu 124: libesmf.so 000014BDCD22F851 esmf_gridcompmod_ Unknown Unknown
dec0360.hsn.de.hpc.ucar.edu 124: libesmf.so 000014BDCD60C9E0 nuopc_driver_mp_l Unknown Unknown
dec0360.hsn.de.hpc.ucar.edu 124: libesmf.so 000014BDCD629055 nuopc_driver_mp_i Unknown Unknown
dec0360.hsn.de.hpc.ucar.edu 124: libesmf.so 000014BDCC973B40 _ZN5ESMCI6FTable1 Unknown Unknown
dec0360.hsn.de.hpc.ucar.edu 124: libesmf.so 000014BDCC973607 ESMCI_FTableCallE Unknown Unknown
dec0360.hsn.de.hpc.ucar.edu 124: libesmf.so 000014BDCCC5DF85 _ZN5ESMCI2VM5ente Unknown Unknown
dec0360.hsn.de.hpc.ucar.edu 124: libesmf.so 000014BDCC974351 c_esmc_ftablecall Unknown Unknown
dec0360.hsn.de.hpc.ucar.edu 124: libesmf.so 000014BDCCEEE6E0 esmf_compmod_mp_e Unknown Unknown
dec0360.hsn.de.hpc.ucar.edu 124: libesmf.so 000014BDCD22F851 esmf_gridcompmod_ Unknown Unknown
dec0360.hsn.de.hpc.ucar.edu 124: libesmf.so 000014BDCD60C9E0 nuopc_driver_mp_l Unknown Unknown
dec0360.hsn.de.hpc.ucar.edu 124: libesmf.so 000014BDCD628F3F nuopc_driver_mp_i Unknown Unknown
dec0360.hsn.de.hpc.ucar.edu 124: libesmf.so 000014BDCD63DD80 nuopc_driver_mp_i Unknown Unknown
dec0360.hsn.de.hpc.ucar.edu 124: libesmf.so 000014BDCC973B40 _ZN5ESMCI6FTable1 Unknown Unknown
dec0360.hsn.de.hpc.ucar.edu 124: libesmf.so 000014BDCC973607 ESMCI_FTableCallE Unknown Unknown
dec0360.hsn.de.hpc.ucar.edu 124: libesmf.so 000014BDCCC5DF85 _ZN5ESMCI2VM5ente Unknown Unknown
dec0360.hsn.de.hpc.ucar.edu 124: libesmf.so 000014BDCC974351 c_esmc_ftablecall Unknown Unknown
dec0360.hsn.de.hpc.ucar.edu 124: libesmf.so 000014BDCCEEE6E0 esmf_compmod_mp_e Unknown Unknown
dec0360.hsn.de.hpc.ucar.edu 124:
dec0360.hsn.de.hpc.ucar.edu 124: Stack trace terminated abnormally.
What are the steps to reproduce the bug?
The easiest is to create a case with --compset F2000dev
to get cam_dev physics and --res mpasa120_mpasa120
to get the MPAS-A dycore. After setting up, building, and submitting the case the run will fail.
E.g. on Derecho:
./cime/scripts/create_newcase --case "${CASENAME}" --project "${PROJ}" --run-unsupported --compiler intel --res mpasa120_mpasa120 --compset F2000dev
What CAM tag were you using?
cam6_3_148
What machine were you running CAM on?
CISL machine (e.g. cheyenne)
What compiler were you using?
Intel
Path to a case directory, if applicable
/glade/derecho/scratch/gdicker/F2000dev_mpasa120_intel_1710435350
Will you be addressing this bug yourself?
Any CAM SE can do this
Extra info
No response
Can you confirm whether this occurs with F2000climo a.k.a. CAM6 physics?
Are these runs with ./xmlchange DEBUG=TRUE?
Thanks.
Hi @adamrher, I can confirm that F2000climo works. I was testing the RRTMGP changes in CAM with MPAS-A, and I was able to run with F2000climo.
I have not tried with DEBUG=TRUE
yet. I will update when I do.
Here's one thread's content in cesm.log from a run with DEBUG=true
dec0314.hsn.de.hpc.ucar.edu 2: ERROR:
dec0314.hsn.de.hpc.ucar.edu 2: cam_mpas_subdriver::cam_mpas_read_static: FATAL: Failed to add 2 fields to stat
dec0314.hsn.de.hpc.ucar.edu 2: ic input stream.
dec0314.hsn.de.hpc.ucar.edu 2: Image PC Routine Line Source
dec0314.hsn.de.hpc.ucar.edu 2: cesm.exe 000000000A913110 shr_abort_mod_mp_ 114 shr_abort_mod.F90
dec0314.hsn.de.hpc.ucar.edu 2: cesm.exe 000000000A912F7A shr_abort_mod_mp_ 61 shr_abort_mod.F90
dec0314.hsn.de.hpc.ucar.edu 2: cesm.exe 0000000009DF56A2 cam_mpas_subdrive 1161 cam_mpas_subdriver.F90
dec0314.hsn.de.hpc.ucar.edu 2: cesm.exe 0000000000CE1FFF dyn_grid_mp_setup 464 dyn_grid.F90
dec0314.hsn.de.hpc.ucar.edu 2: cesm.exe 0000000000CDC9B0 dyn_grid_mp_dyn_g 138 dyn_grid.F90
dec0314.hsn.de.hpc.ucar.edu 2: cesm.exe 0000000000957350 cam_comp_mp_cam_i 165 cam_comp.F90
dec0314.hsn.de.hpc.ucar.edu 2: cesm.exe 00000000008FEED9 atm_comp_nuopc_mp 635 atm_comp_nuopc.F90
dec0314.hsn.de.hpc.ucar.edu 2: libesmf.so 000014C4BF90DDA9 callVFuncPtr 2167 ESMCI_FTable.C
dec0314.hsn.de.hpc.ucar.edu 2: libesmf.so 000014C4BF90CDE8 ESMCI_FTableCallE 824 ESMCI_FTable.C
dec0314.hsn.de.hpc.ucar.edu 2: libesmf.so 000014C4BFD9DB72 enter 2318 ESMCI_VMKernel.C
dec0314.hsn.de.hpc.ucar.edu 2: libesmf.so 000014C4BFD87010 enter 1216 ESMCI_VM.C
dec0314.hsn.de.hpc.ucar.edu 2: libesmf.so 000014C4BF90E18F c_esmc_ftablecall 981 ESMCI_FTable.C
dec0314.hsn.de.hpc.ucar.edu 2: libesmf.so 000014C4C03ED650 esmf_compmod_mp_e 1223 ESMF_Comp.F90
dec0314.hsn.de.hpc.ucar.edu 2: libesmf.so 000014C4C0D7B8E5 esmf_gridcompmod_ 1412 ESMF_GridComp.F90
dec0314.hsn.de.hpc.ucar.edu 2: libesmf.so 000014C4C1821DFC nuopc_driver_mp_l 2889 NUOPC_Driver.F90
dec0314.hsn.de.hpc.ucar.edu 2: libesmf.so 000014C4C180A69F nuopc_driver_mp_i 1992 NUOPC_Driver.F90
dec0314.hsn.de.hpc.ucar.edu 2: libesmf.so 000014C4BF90DDA9 callVFuncPtr 2167 ESMCI_FTable.C
dec0314.hsn.de.hpc.ucar.edu 2: libesmf.so 000014C4BF90CDE8 ESMCI_FTableCallE 824 ESMCI_FTable.C
dec0314.hsn.de.hpc.ucar.edu 2: libesmf.so 000014C4BFD9DB72 enter 2318 ESMCI_VMKernel.C
dec0314.hsn.de.hpc.ucar.edu 2: libesmf.so 000014C4BFD87010 enter 1216 ESMCI_VM.C
dec0314.hsn.de.hpc.ucar.edu 2: libesmf.so 000014C4BF90E18F c_esmc_ftablecall 981 ESMCI_FTable.C
dec0314.hsn.de.hpc.ucar.edu 2: libesmf.so 000014C4C03ED650 esmf_compmod_mp_e 1223 ESMF_Comp.F90
dec0314.hsn.de.hpc.ucar.edu 2: libesmf.so 000014C4C0D7B8E5 esmf_gridcompmod_ 1412 ESMF_GridComp.F90
dec0314.hsn.de.hpc.ucar.edu 2: libesmf.so 000014C4C1821DFC nuopc_driver_mp_l 2889 NUOPC_Driver.F90
dec0314.hsn.de.hpc.ucar.edu 2: libesmf.so 000014C4C180A44C nuopc_driver_mp_i 1987 NUOPC_Driver.F90
dec0314.hsn.de.hpc.ucar.edu 2: libesmf.so 000014C4C17CF051 nuopc_driver_mp_i 487 NUOPC_Driver.F90
dec0314.hsn.de.hpc.ucar.edu 2: libesmf.so 000014C4BF90DDA9 callVFuncPtr 2167 ESMCI_FTable.C
dec0314.hsn.de.hpc.ucar.edu 2: libesmf.so 000014C4BF90CDE8 ESMCI_FTableCallE 824 ESMCI_FTable.C
dec0314.hsn.de.hpc.ucar.edu 2: libesmf.so 000014C4BFD9DB72 enter 2318 ESMCI_VMKernel.C
dec0314.hsn.de.hpc.ucar.edu 2: libesmf.so 000014C4BFD87010 enter 1216 ESMCI_VM.C
dec0314.hsn.de.hpc.ucar.edu 2: libesmf.so 000014C4BF90E18F c_esmc_ftablecall 981 ESMCI_FTable.C
dec0314.hsn.de.hpc.ucar.edu 2:
dec0314.hsn.de.hpc.ucar.edu 2: Stack trace terminated abnormally.
dec0314.hsn.de.hpc.ucar.edu 2: MPICH ERROR [Rank 2] [job id 5d63df0c-2c01-4c32-88d0-b8a50fe5fa22] [Thu Mar 14 11:26:20 2024] [dec0314] - Abort(1001) (rank 2 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 1001) - process 2
dec0314.hsn.de.hpc.ucar.edu 2:
dec0314.hsn.de.hpc.ucar.edu 2: aborting job:
dec0314.hsn.de.hpc.ucar.edu 2: application called MPI_Abort(MPI_COMM_WORLD, 1001) - process 2
From a run on Derecho within "/glade/derecho/scratch/gdicker/F2000dev_mpasa120_intel_dbg_1710436541"
Is this just a problem with the IC file? I've run this with my own analytic IC files and cam_dev physics before. I think it just needs those two missing fields (cell_gradient_coef_x and cell_gradient_coef_y).
As a temporary workaround, if testing without the frontogenesis gravity wave drag (?) scheme is acceptable, setting use_gw_front = false
in CAM's namelist might suffice. It looks like the cell_gradient_coef_x
and cell_gradient_coef_y
fields are only read if use_gw_front
or use_gw_front_igw
are true: https://github.com/ESCOMP/CAM/blob/cam6_3_148/src/dynamics/mpas/driver/cam_mpas_subdriver.F90#L1152-L1162 .
Thanks @briandobbins and @mgduda for the tips.
Is this just a problem with the IC file?
It might be. I think only "atm/cam/inic/mpas/mpasa60_L32_notopo_coords_c230707.nc" has cell_gradient_coef_{xy}
variables from what I checked.
... setting
use_gw_front = false
in CAM's namelist might suffice....
I just tried a couple of these F2000dev MPAS-A runs with use_gw_front = .false.
added to user_nl_cam, and they succeeded!
As a temporary workaround, if testing without the frontogenesis gravity wave drag (?) scheme is acceptable
This was off in CAM6, so it's not terrible to omit this process in the near term. But this should get fixed for production runs as our midlatitude jets and polar vortex are too strong, and so the additional drag caused by turning the frontal scheme on does move the solution in the right direction.
This is less important at higher resolutions where these waves start to become resolved.
@gdicker1 if this issue is just due to missing variables in the inic file when running the frontal scheme, should we close (or rename) this issue?
If the issue isn't fixed, I'm not sure why it should be closed. Unless someone has regenerated the files already?
@adamrher I think the issue title was fine but I changed it to "Runs with MPAS-A dycore and CAM7 physics fail - missing variables in inic files." If that still isn't what you imagined, I don't mind if the title changes again.
@gdicker1 understood. You're right, the original name still conveyed this issue. I was just confused since folks have been running cam_dev with MPAS for a while now, but the issue is that our namelist_defaults have a large number of inic without the variables req'd to run cam_dev.
Hi @gdicker1. I was looking through the issues and we don't have a general issue for bringing in L58/L93 support for mpas. This issue here is related, but not encompassing of the entire effort, which now includes this issue: https://github.com/ESCOMP/CAM/issues/1102. I was going to open the issue but wanted to check with you first.
Only mpasa120 and mpasa480 are supported in cam_development. So I was thinking the issue could just provide support for those two grids -- hi-res and var-res can be a separate issue that we can address after supporting the coarser grids. Thoughts?
Hi @adamrher, thanks for checking. I think this sounds reasonable, especially to add other resolutions later.
Just to add some other thoughts: Other times this has come up there wasn't agreement on what the level heights should be for L58 and L93 (but I think this has been resolved). There has also been concerns about the amount of space the (high-resolution) files could take up on CESM data servers, especially since we could have with 3 versions of a similar grid (notopo, topo, and real-data).
Short term, let's get all the 120km cases done - space isn't much of a concern there, and since it's the workhorse resolution, and the one likely to be 'tested' the most, the value of having things work out of the box is big.
Longer term, for high-resolution cases, I've got some discussions going on with CISL about moving our input storage (and merging the EarthWorks & CESM datasets) on to new infrastructure that's got more, and scalable, space.
Cheers,
- Brian
On Wed, Jul 24, 2024 at 2:04 PM G. Dylan Dickerson @.***> wrote:
Hi @adamrher https://github.com/adamrher, thanks for checking. I think this sounds reasonable, especially to add other resolutions later.
Just to add some other thoughts: Other times this has come up there wasn't agreement on what the level heights should be for L58 and L93 (but I think this has been resolved). There has also been concerns about the amount of space the (high-resolution) files could take up on CESM data servers, especially since we could have with 3 versions of a similar grid (notopo, topo, and real-data).
— Reply to this email directly, view it on GitHub https://github.com/ESCOMP/CAM/issues/995#issuecomment-2248804895, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACL2HPNFD7GLDL4MWBWDCY3ZOACGVAVCNFSM6AAAAABEWP5RJGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDENBYHAYDIOBZGU . You are receiving this because you were mentioned.Message ID: @.***>