hermes-3 icon indicating copy to clipboard operation
hermes-3 copied to clipboard

`neutral_mixed` MMS tests and expanded differential operator MMS tests

Open mrhardman opened this issue 8 months ago • 3 comments

The changes included here aim to significantly expand the tests available in tests/mms. The following features are added:

  • symbolic operators in the tests/mms/perpendicular_laplacian.py module covering all of the differential operators that are present in neutral_mixed.
  • extension of nonorthogonal_test.py and orthogonal_test.py to test the newly added differential operators. This requires some changes to main.cxx.
  • an automatic test of the right hand side of the equation solved in neutral_mixed. We check that the definition of the right hand side matches the expected convergence order. Note that the expected convergence order is second order (2) when evolve_momentum = false, but the observed convergence order is first order or worse when evolve_momentum = true.
  • ability to add an external source of momentum to neutrals in neutral_mixed.
  • ability to read normalised sources in neutral_mixed.

The major addition is the test script neutral_mixed_ddt_test.py, which runs the following four tests with prescribed functions Nd, Pd, NVd that are constructed to be functions of (x,y) and satisfy the boundary conditions imposed in neutral_mixed.

  • neutral_mixed/evolve_momentum_False_conservation_test_False. Check the definition of the RHS when evolve_momentum = false, by calculating symbolically ddt(Nd) and ddt(Pd), and using these symbolic expressions as sources to cancel the numerical ddt(Nd) and ddt(Pd), such that the output ddt(Nd) and ddt(Pd) are of order the discretisation error. The spatial resolution is scanned to check convergence, with the following results. fig_0 fig_1

  • neutral_mixed/evolve_momentum_False_conservation_test_True. Check that the RHS when evolve_momentum = false conserves Nd and Pd. The spatial resolution is scanned to check convergence, with the following results. Note that density is exactly conserved but pressure is only conserved within numerical error. A volume integration is done over ddt(Nd) and ddt(Pd) (without a symbolic source) to obtain these errors. fig_0 fig_1

  • neutral_mixed/evolve_momentum_True_conservation_test_False. Check the definition of the RHS when evolve_momentum =true, by calculating symbolically ddt(Nd), ddt(NVd) and ddt(Pd), and using these symbolic expressions as sources to cancel the numerical ddt(Nd), ddt(NVd)and ddt(Pd), such that the output ddt(Nd), ddt(NVd)and ddt(Pd) are of order the discretisation error. The spatial resolution is scanned to check convergence, with the following results. fig_1 fig_2 fig_0

  • neutral_mixed/evolve_momentum_True_conservation_test_True. Check that the RHS when evolve_momentum = true conserves Nd and energy. The spatial resolution is scanned to check convergence, with the following results. Note that conservation is first order accurate for density and less than first order for energy. fig_0 fig_1

These tests can be readily extended to test the ddt() of other physics modules, or to 3D (x,y,z), allowing for rapid testing of the definition of the equations solved.

Potential issues requiring further work:

  • [x] Flag documentation issues, with respect to dimensionally incorrect documentation for momentum equation for neutrals, factors of mass in viscous coefficients. (See #328 for notes).
  • [ ] Determine why energy conservation was not observed.
  • [ ] Extend nonorthogonal operators to charged species.
  • [ ] Demonstrate neutral_mixed working smoothly with nonorthogonal mesh in realistic physics case.
  • [ ] Improve readme and documentation for tests -- reviewers please make suggestions.
  • [ ] Anything else raised by reviewers.

mrhardman avatar Apr 16 '25 09:04 mrhardman

The reason the CIs keep failing on this is that you're using the CVODE solver but SUNDIALS isn't available in your CI image. I've switched it over to PVODE, which seems to work. I've also updated the MMS testing script to print the output when there is an error, which should make such issues easier to diagnose in future.

cmacmackin avatar Jul 29 '25 17:07 cmacmackin

I'm struggling to work out why the CI is continuing to fail for neutral_mixed_ddt_test.py. It always seems to occur in the post-processing stages, but not consistently in the same place. When error messages are produced they indicate that it is related to the xarray objects reading netCDF data. I can not reproduce any of these locally (even when running with act).

Run 1947

Error seems to occur after reading in and printing the second BOUT++ dataset for the final test case. Presumably it actually happens while reading the third (and final) dataset. Reported as a segfault. No backtrace is provided.

...

[Pd]
function = (0.2*x^3 - 0.3*x^2 + 0.1)*sin(2*y) + 1.0
bndry_core = neumann
bndry_all = neumann

[NVd]
function = (x^4 - 2*x^3 + x^2)*sin(y)
bndry_core = neumann
bndry_all = neumann



93% tests passed, 1 tests failed out of 14

Total Test time (real) = 220.70 sec

The following tests FAILED:
	 14 - rhs_mms_neutral_mixed (SEGFAULT)

Run 1948

Error seemed to occurred when converting l2norm to a NumPy array for the final test case. This is after all of the datasets have been read in.

Traceback (most recent call last):
  File "/home/runner/.local/lib/python3.10/site-packages/xarray/core/common.py", line 158, in __float__
    return float(self.values)
  File "/home/runner/.local/lib/python3.10/site-packages/xarray/core/dataarray.py", line 797, in values
    return self.variable.values
  File "/home/runner/.local/lib/python3.10/site-packages/xarray/core/variable.py", line 530, in values
    return _as_array_or_item(self._data)
  File "/home/runner/.local/lib/python3.10/site-packages/xarray/core/variable.py", line 315, in _as_array_or_item
    data = np.asarray(data)
  File "/home/runner/.local/lib/python3.10/site-packages/dask/array/core.py", line 1729, in __array__
    x = self.compute()
  File "/home/runner/.local/lib/python3.10/site-packages/dask/base.py", line 373, in compute
    (result,) = compute(self, traverse=False, **kwargs)
  File "/home/runner/.local/lib/python3.10/site-packages/dask/base.py", line 681, in compute
    results = schedule(expr, keys, **kwargs)
  File "/home/runner/.local/lib/python3.10/site-packages/xarray/core/indexing.py", line 577, in __array__
    return np.asarray(self.get_duck_array(), dtype=dtype)
  File "/home/runner/.local/lib/python3.10/site-packages/xarray/core/indexing.py", line 580, in get_duck_array
    return self.array.get_duck_array()
  File "/home/runner/.local/lib/python3.10/site-packages/xarray/core/indexing.py", line 799, in get_duck_array
    return self.array.get_duck_array()
  File "/home/runner/.local/lib/python3.10/site-packages/xarray/core/indexing.py", line 654, in get_duck_array
    array = self.array[self.key]
  File "/home/runner/.local/lib/python3.10/site-packages/xarray/backends/netCDF4_.py", line 103, in __getitem__
    return indexing.explicit_indexing_adapter(
  File "/home/runner/.local/lib/python3.10/site-packages/xarray/core/indexing.py", line 1023, in explicit_indexing_adapter
    result = raw_indexing_method(raw_key.tuple)
  File "/home/runner/.local/lib/python3.10/site-packages/xarray/backends/netCDF4_.py", line 116, in _getitem
    array = getitem(original_array, key)
  File "src/netCDF4/_netCDF4.pyx", line 5055, in netCDF4._netCDF4.Variable.__getitem__
  File "src/netCDF4/_netCDF4.pyx", line 4631, in netCDF4._netCDF4.Variable.shape.__get__
  File "src/netCDF4/_netCDF4.pyx", line 3786, in netCDF4._netCDF4.Dimension.__len__
  File "src/netCDF4/_netCDF4.pyx", line 2164, in netCDF4._netCDF4._ensure_nc_success
RuntimeError: NetCDF: HDF error

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/runner/work/hermes-3/hermes-3/build/tests/mms/neutral_mixed_ddt_test.py", line 176, in <module>
    success, output_message_3 = run_neutral_mixed_manufactured_solutions_test(test_input)
  File "/home/runner/work/hermes-3/hermes-3/build/tests/mms/job_functions.py", line 442, in run_neutral_mixed_manufactured_solutions_test
    l2norm = np.array(l2norm)
ValueError: setting an array element with a sequence.


93% tests passed, 1 tests failed out of 14

Total Test time (real) = 220.37 sec

The following tests FAILED:
	 14 - rhs_mms_neutral_mixed (Failed)

Run 1949

Same as Run 1947.

Run 1950

Error in final test case. I think this happens when turning the l2error into a NumPy array, again.

Exception ignored in: <function CachingFileManager.__del__ at 0x7fcd20379ab0>
Traceback (most recent call last):
  File "/home/runner/.local/lib/python3.10/site-packages/xarray/backends/file_manager.py", line 250, in __del__
    self.close(needs_lock=False)
  File "/home/runner/.local/lib/python3.10/site-packages/xarray/backends/file_manager.py", line 234, in close
    file.close()
  File "src/netCDF4/_netCDF4.pyx", line 2669, in netCDF4._netCDF4.Dataset.close
  File "src/netCDF4/_netCDF4.pyx", line 2636, in netCDF4._netCDF4.Dataset._close
  File "src/netCDF4/_netCDF4.pyx", line 2164, in netCDF4._netCDF4._ensure_nc_success
RuntimeError: NetCDF: HDF error


93% tests passed, 1 tests failed out of 14

Total Test time (real) = 220.34 sec

The following tests FAILED:
	 14 - rhs_mms_neutral_mixed (SEGFAULT)

Run 1955

Same as 1947 and 1950, except this time there is a backtrace:

Exception ignored in: <function CachingFileManager.__del__ at 0x7facddacdab0>
Traceback (most recent call last):
  File "/home/runner/.local/lib/python3.10/site-packages/xarray/backends/file_manager.py", line 250, in __del__
    self.close(needs_lock=False)
  File "/home/runner/.local/lib/python3.10/site-packages/xarray/backends/file_manager.py", line 234, in close
    file.close()
  File "src/netCDF4/_netCDF4.pyx", line 2669, in netCDF4._netCDF4.Dataset.close
  File "src/netCDF4/_netCDF4.pyx", line 2636, in netCDF4._netCDF4.Dataset._close
  File "src/netCDF4/_netCDF4.pyx", line 2164, in netCDF4._netCDF4._ensure_nc_success
RuntimeError: NetCDF: HDF error


93% tests passed, 1 tests failed out of 14

Total Test time (real) = 222.76 sec

The following tests FAILED:
	 14 - rhs_mms_neutral_mixed (SEGFAULT)

Run 1956

Again, seems to happen when converting the l2norm to a NumPy array for the final test case. This time there is no backtrace, just a low-level memory error.

malloc_consolidate(): unaligned fastbin chunk detected


93% tests passed, 1 tests failed out of 14

Total Test time (real) = 219.69 sec

The following tests FAILED:
	 14 - rhs_mms_neutral_mixed (Subprocess aborted)
Error: Process completed with exit code 8.

Some reflections

The evenly numbered runs are those labelled as being for the "pull request", while the odd number of runs are labelled as for the "push". In reality, both run exactly the same workflow. However, the "pull request" jobs fail when converting the l2norm to a NumPy array while the "push" jobs fail slightly earlier, while reading in the third dataset. The error messages that get printed vary from run to run, but not are particularly helpful in figuring out what is going wrong.

cmacmackin avatar Jul 31 '25 16:07 cmacmackin

Thanks @cmacmackin for your investigation so far. Maybe @ZedThree could chime in once he's back from leave - this seems particularly subtle..

mikekryjak avatar Aug 01 '25 08:08 mikekryjak