CAM icon indicating copy to clipboard operation
CAM copied to clipboard

CAM-MPAS: Missing mpas_block_decomp_file_prefix file gives unhelpfull error message

Open MiCurry opened this issue 3 years ago • 1 comments

Background: MPAS decomposition files are created by a third-party program, gpmetis (graph partition metis). In standalone MPAS, users use the gpmetis on a graph.info file of a given grid to generate decomposition files for the number of tasks they wish to run MPAS with.

The same file is needed in CAM-MPAS and the file is set in the user_nl_cam namelist options, mpas_block_decomp_file_prefix. Which doesn't set the file, but the file prefix:

'mpasa60.graph.info.part.'

The MPAS decomposition routines then append the number of tasks to end of the prefix and uses that file as the decomposition ('mpasa60.graph.info.part.36').

When running with the MPAS dycore in CAM on a processor count where there is not a corresponding block decomposition file the following error message is produced, which does not provide any helpful information as to the failure:

Traceback (most recent call last):
  File "/glade/scratch/mcurry/CESM-ESCOMP.062421/cime/scripts/Tools/../../scripts/lib/CIME/utils.py", line 1809, in run_and_log_case_status
    rv = func()
  File "/glade/scratch/mcurry/CESM-ESCOMP.062421/cime/scripts/Tools/../../scripts/lib/CIME/case/case_submit.py", line 205, in <lambda>
    batch_args=batch_args, workflow=workflow)
  File "/glade/scratch/mcurry/CESM-ESCOMP.062421/cime/scripts/Tools/../../scripts/lib/CIME/case/case_submit.py", line 133, in _submit
    case.check_case(skip_pnl=skip_pnl)
  File "/glade/scratch/mcurry/CESM-ESCOMP.062421/cime/scripts/Tools/../../scripts/lib/CIME/case/case_submit.py", line 223, in check_case
    self.check_all_input_data()
  File "/glade/scratch/mcurry/CESM-ESCOMP.062421/cime/scripts/Tools/../../scripts/lib/CIME/case/check_input_data.py", line 176, in check_all_input_data
    attributes={"CLM_USRDAT_NAME":clm_usrdat_name})
  File "/glade/scratch/mcurry/CESM-ESCOMP.062421/cime/scripts/Tools/../../scripts/lib/CIME/case/check_input_data.py", line 193, in _downloadfromserver
    protocol, address, user, passwd, _, ic_filepath = inputdata.get_next_server(attributes=attributes)
  File "/glade/scratch/mcurry/CESM-ESCOMP.062421/cime/scripts/Tools/../../scripts/lib/CIME/XML/inputdata.py", line 39, in get_next_server
    self._servernode = servernodes[0]
IndexError: list index out of range

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "./case.submit", line 126, in <module>
    _main_func(__doc__)
  File "./case.submit", line 123, in _main_func
    mail_user=mail_user, mail_type=mail_type, batch_args=batch_args, workflow=workflow)
  File "/glade/scratch/mcurry/CESM-ESCOMP.062421/cime/scripts/Tools/../../scripts/lib/CIME/case/case_submit.py", line 208, in submit
    is_batch=is_batch)
  File "/glade/scratch/mcurry/CESM-ESCOMP.062421/cime/scripts/Tools/../../scripts/lib/CIME/utils.py", line 1811, in run_and_log_case_status
    custom_success_msg = custom_success_msg_functor(rv) if custom_success_msg_functor else None
  File "/glade/scratch/mcurry/CESM-ESCOMP.062421/cime/scripts/Tools/../../scripts/lib/CIME/case/case_submit.py", line 207, in <lambda>
    custom_success_msg_functor=lambda x: x.split(":")[-1],
AttributeError: 'NoneType' object has no attribute 'split'

The error can be fix/circumvented by producing a graph.info.part.xx to match the expected decomposition file. Although I have not looked at the code that is causing this error, my first thought is that the code that is searching for the file is getting confused with adding the ntasks to the prefix.

MiCurry avatar Jul 21 '21 20:07 MiCurry

This can be assigned to me. I'll merge it into cam_mpas_dev and perhaps we can include with a future PR.

MiCurry avatar Jul 21 '21 20:07 MiCurry