Oceananigans.jl icon indicating copy to clipboard operation
Oceananigans.jl copied to clipboard

Don't separate the pressure into hydrostatic and nonhydrostatic in `NonhydrostaticModel`

Open tomchor opened this issue 1 year ago • 39 comments

This PR changes NonhydrostaticModel to use a single pressure variable. Prior to this PR, pressure is decomposed into a "hydrostatic anomaly" and a "non-hydrostatic" component. The vertical momentum equation in NonhydrostaticModel is

$$ \partial_t w = - \partial_z p + b + G_w' $$

where $G_w'$ collects the contributions to the vertical momentum tendency other than pressure gradient and buoyancy. Prior to this PR, the kinematic pressure $p$ is decomposed into (note that in this Boussinesq formulation, a hydrostatic component has already been removed from $p$)

$$ p = p_h' + p_n $$

with the hydrostatic anomaly $p_h'$ defined by

$$ p_h' = - \int^0_z b \mathrm{d} z $$

Using the hydrostatic pressure anomaly eliminates buoyancy from the vertical momentum equation, such that

$$ \partial_t w = - \partial_z p_n + G_w' $$

Buoyancy forces enter the dynamics via horizontal gradients of the hydrostatic pressure anomaly. For example, x-gradient of the kinematic pressure becomes

$$ \partial_x p = \partial_x p_h' + \partial_x p_n $$

This decomposition is advantageous for two reasons. First, in a hydrostatic model the vertical momentum equation reduces to the equation for $p_h'$. This means that switching from a hydrostatic to non-hydrostatic model is particularly simple given this decomposition.

Second --- and this must be evaluated --- it's possible to carefully control the evaluation of the hydrostatic integral so that a resting stratified fluid remains at rest, even in the presence of complex bathymetry. When we use the "MITgcm algorithm" we achieve this perfectly, even with partial cell bathymetry.

This PR proposes to eliminate the pressure decomposition so that there is only one pressure. In principle, this has a computational advantage because the hydrostatic pressure integral does not need to be evaluated (in practice, this computation has a negligible cost). It also reduces the number of memory loads that take place in the momentum advection kernels (though these are typically domained by advection scheme, so this may not matter except for centered advection schemes). Also in principle, it would allow 3D domain decompositions for distributed computations, in addition to 2D (but again, these are rarely used because typical ocean domains are shallow and wide, rather than deep and narrow). Having a single pressure also simplifies diagnostics. Finally, and perhaps most importantly, we can avoid allocating memory for an additional 3D field. In the absolute best case scenario of a model with no tracers and pure implicit dissipation, this means we go from 14 3D fields (9 for prognostic momentum + tendencies, 4 (?) for nonhydrostatic pressure including scratch variables for FFTs, and 1 for hydrostatic pressure) to 13 3D fields. So it saves about 7%. In more typical situations with LES closure and one active tracer, the savings is more marginal: we go from 19 3D fields to 18 3D fields, and thus have 5% more memory. Note also that more scratch variables are needed for FFTs in domains that are not horizontally-periodic. (This memory bookkeeping should be double checked.)

In summary, the main advantages and risks of this PR are:

Risk: loss of accuracy in scenarios that are dominated by hydrostatic balance. This issues may be exacerbated by experimental or not-yet-existing features, such as: cut cell and partial cell bathymetry, curvilinear grids, nonlinear free surfaces, reduced precision arithemetic, etc.

Advantage: memory savings of at most 7% but more typically 1-5%, and a cleaner code and user interface.

Note, there was another attempt to coalesce the pressures in https://github.com/CliMA/Oceananigans.jl/pull/1910. However, buoyancy was not reconstructed properly in the momentum equations (buoyancy is at tracer points; thus the buoyancy force has to be interpolated to by included in the vertical momentum balance). and thus the discretization was incorrect and produced spurious dynamics. This bug was fixed by https://github.com/CliMA/Oceananigans.jl/pull/3079.

In all of our tests so far, the "dynamics seem clean". However, it's not clear whether there are unforeseen issues in scenarios that we haven't tested, or rather are impossible to test because the feature does not exist yet (such as accurate reduced precision algorithm or nonlinear free surface). Thus we should consider this PR carefully.

tomchor avatar Apr 21 '23 17:04 tomchor

Docs are previewing here: https://clima.github.io/OceananigansDocumentation/previews/PR3080/

I checked all the examples with NonhdyrostaticModel and they all look the same as they do on the stable branch.

Furthermore, the few tests failures that we have are all something like

JLD2 output writer [CPU]: Test Failed at /var/lib/buildkite-agent/builds/tartarus-1/clima/oceananigans/test/test_jld2_output_writer.jl:131
--
  | Expression: wu == zero(FT)
  | Evaluated: -3.009265538105056e-35 == 0.0

i.e. very small approximation errors that aren't indicative of any significant errors in the model.

In other words, I think this is working well! I vote we simplify the model and get rid of the hydrostatic separation.

@glwagner as you mentioned, this isn't a trivial change. If you wanna move forward with it, feel free to push to this PR or close this one and open another. I can also help if you want, just lmk what I should focus on.

PS: Just like we did in https://github.com/CliMA/Oceananigans.jl/pull/1910 we might need/want to replace the stratified_fluid_remains_at_rest_with_tilted_gravity_buoyancy_tracer() test for something simpler.

tomchor avatar Apr 21 '23 22:04 tomchor

annoying that the tests need a little refactoring too, but maybe its all for the best.

glwagner avatar Apr 24 '23 16:04 glwagner

With this last commit both the hydrostatic and nonhydrostatic models are working locally for me and they only have a model.pressure field; no more NamedTuples of pressure. But let's see if tests are passing.

The PressureField functions (formerly PressureFields) should probably be moved from field_tuples.jl since they don't return tuples anymore. Any suggestion as to where they might be moved?

tomchor avatar May 12 '23 17:05 tomchor

I think code-wise, this is pretty much ready (save some occasional polishing like here). pHY and pNHS are no more, and both both hydrostatic and nonhydrostaic models, the pressure is simply model.pressure.

The only tests that are failing and some regression tests, whose data will have to be re-done, and a scalar-index issue on GPUs.

I think the only major change that's left is the documentation. @glwagner, in the past you preferred to make big changes to the docs yourself. Do you wanna remove the pressure separation part from the docs and push it here?

tomchor avatar May 16 '23 19:05 tomchor

What is the difference in the regression tests between main and this PR?

glwagner avatar May 22 '23 18:05 glwagner

What is the difference in the regression tests between main and this PR?

afaik nothing. I think the change in the pressure algorithm causes very small differences in the solution which are enough to make the regression tests fail.

tomchor avatar May 22 '23 18:05 tomchor

What are the differences?

glwagner avatar May 22 '23 18:05 glwagner

What are the differences?

I haven't plotted/analyzed the solutions for the regression tests carefully yet (tbh I'm not familiar yet with how the regression tests work and how the data is stored), but I have carefully compared my own simulations and a couple of the examples using main and this branch.

The only differences I see are the usual turbulence IC "issue" where small disturbances in the flow amplify and lead to a different solution (pointwise speaking) but with the same statistics, qualitative behavior, etc.

Given my unfamiliarity with the regression tests I'm not sure how to proceed solving the tests. So I'd appreciate some guidance here on how to move forward to get all the tests passing.

tomchor avatar May 23 '23 01:05 tomchor

Don't the regression tests output text indicating how many grid points are different, and what the maximum differences are?

glwagner avatar May 23 '23 14:05 glwagner

Don't the regression tests output text indicating how many grid points are different, and what the maximum differences are?

Yes! Sorry, is that what you were asking? I apologize, I thought you were asking about what is causing or what is the nature of the differences.

Here's a snippet from the thermal bubble test, which is the worst offender, as an example:

  | [2023/05/22 15:07:12.173] INFO  Δu: min=-1.559869e-13, max=+1.317666e-13, mean=+7.650904e-22, absmean=+2.393027e-14, std=+3.376841e-14 (3926/4096 matching grid points)
  | [2023/05/22 15:07:12.174] INFO  Δv: min=-1.339960e-13, max=+1.403664e-13, mean=+1.756466e-22, absmean=+1.796207e-14, std=+2.558788e-14 (3971/4096 matching grid points)
  | [2023/05/22 15:07:12.175] INFO  Δw: min=-4.886854e-13, max=+5.149812e-13, mean=+2.845466e-16, absmean=+5.854449e-14, std=+8.803219e-14 (4318/4352 matching grid points)
  | [2023/05/22 15:07:12.176] INFO  ΔT: min=-1.303846e-12, max=+1.495692e-12, mean=-4.293441e-17, absmean=+1.875657e-13, std=+2.728047e-13 (4096/4096 matching grid points)
  | [2023/05/22 15:07:12.177] INFO  ΔS: min=-4.632739e-12, max=+5.300649e-12, mean=-1.405126e-16, absmean=+6.662951e-13, std=+9.690186e-13 (4096/4096 matching grid points)
  | Thermal bubble [CPU, vertically_unstretched grid]: Test Failed at /var/lib/buildkite-agent/builds/tartarus-9/clima/oceananigans/test/regression_tests/thermal_bubble_regression_test.jl:76
  | Expression: all(test_fields.u .≈ correct_fields.u)
  | Stacktrace:
  | [1] macro expansion
  | @ /storage5/buildkite-agent/julia-1.8.5/share/julia/stdlib/v1.8/Test/src/Test.jl:464 [inlined]
  | [2] run_thermal_bubble_regression_test(arch::CPU, grid_type::Symbol)
  | @ Main ~/builds/tartarus-9/clima/oceananigans/test/regression_tests/thermal_bubble_regression_test.jl:76
  | Thermal bubble [CPU, vertically_unstretched grid]: Test Failed at /var/lib/buildkite-agent/builds/tartarus-9/clima/oceananigans/test/regression_tests/thermal_bubble_regression_test.jl:77
  | Expression: all(test_fields.v .≈ correct_fields.v)
  | Stacktrace:
  | [1] macro expansion
  | @ /storage5/buildkite-agent/julia-1.8.5/share/julia/stdlib/v1.8/Test/src/Test.jl:464 [inlined]
  | [2] run_thermal_bubble_regression_test(arch::CPU, grid_type::Symbol)
  | @ Main ~/builds/tartarus-9/clima/oceananigans/test/regression_tests/thermal_bubble_regression_test.jl:77
  | Thermal bubble [CPU, vertically_unstretched grid]: Test Failed at /var/lib/buildkite-agent/builds/tartarus-9/clima/oceananigans/test/regression_tests/thermal_bubble_regression_test.jl:78
  | Expression: all(test_fields.w .≈ correct_fields.w)

You can see that u, v, and w fail, but the errors are really small and most points are a match. As far as I can tell, for other regression simulations only w fails and the results seem to be the same both on the CPU and GPU.

tomchor avatar May 23 '23 15:05 tomchor

ok great, that is nice. The differences are larger than eps but only barely

glwagner avatar May 23 '23 16:05 glwagner

I'd be interested in @christophernhill and @jm-c thoughts on this before merging

glwagner avatar May 23 '23 16:05 glwagner

@xkykai do you think you could run some immersed boundary tests with this branch to make sure this change doesn’t affect your work? I think we’re interested in both performance and making sure the solution is high quality.

glwagner avatar Jun 08 '23 14:06 glwagner

and a scalar-index issue on GPUs.

What's the scalar indexing issue?

glwagner avatar Jun 08 '23 15:06 glwagner

@xkykai do you think you could run some immersed boundary tests with this branch to make sure this change doesn’t affect your work? I think we’re interested in both performance and making sure the solution is high quality.

Do you mean using the immersed pressure solver in this branch, and comparing the solution this produces with the one before this change?

xkykai avatar Jun 08 '23 15:06 xkykai

and a scalar-index issue on GPUs.

What's the scalar indexing issue?

It's already fixed. It was a test that was failing because we were comparing point-wise values. If I find the exact lines I'll post them here.

tomchor avatar Jun 08 '23 16:06 tomchor

In order to build confidence and move forward with this PR, here's a comparison between the same simulations on main and on this branch. I'm using a channel set-up (Bounded, Periodic, Bounded topology ) to simulate a front with Smagorinsky closure, using only a surface buoyancy flux as forcing.

Here's how it looks on the main branch:

https://github.com/CliMA/Oceananigans.jl/assets/13205162/47b7ffe1-5e59-4459-977d-db9ce2ee3644

The artifacts at the bottom are due to the fact that I'm using a stretched grid that gets pretty coarse as you move away from the surface. And here is the same simulation, but run on this branch:

https://github.com/CliMA/Oceananigans.jl/assets/13205162/2fd8c124-4dd6-42c6-bef9-5c16bd55dd4e

which looks pretty much the same.

In addition, every averaged quantity that I've tested looks almost exactly the same between both branches. Here's $w'b'$ as an example:

image

Also I should mention that I've been using this branch for a few weeks, and so far I haven't noticed anything different from the main or suspicious in any way.

@glwagner with this comparison (and the many others I've run for my own research), plus the regression tests being off just by approximately eps(), plus all the examples in docs looking the same, I personally feel pretty confident that this branch is working as intended. Please let me know if there's any other tests that need to be done before we move forward here!

tomchor avatar Jun 11 '23 22:06 tomchor

@tomchor the description of this issue is a little vague. The dynamics weren't "weird" --- the problem was a bug in reconstructing buoyancy in the momentum equations, right? It'd be nice to add a little more explicit description of the original issue. I think there is a PR that fixed the bug that we can point to as well.

glwagner avatar Jun 29 '23 16:06 glwagner

@tomchor the description of this issue is a little vague. The dynamics weren't "weird" --- the problem was a bug in reconstructing buoyancy in the momentum equations, right? It'd be nice to add a little more explicit description of the original issue. I think there is a PR that fixed the bug that we can point to as well.

The issue is that the videos aren't available anymore and, apart from the internal wave packet example, I don't really remember what was happening well enough to be able describe it objectively.

That said, I did my best to make the description more clear and linked the PR that solved the bug.

tomchor avatar Jul 02 '23 20:07 tomchor

@tomchor the description of this issue is a little vague. The dynamics weren't "weird" --- the problem was a bug in reconstructing buoyancy in the momentum equations, right? It'd be nice to add a little more explicit description of the original issue. I think there is a PR that fixed the bug that we can point to as well.

The issue is that the videos aren't available anymore and, apart from the internal wave packet example, I don't really remember what was happening well enough to be able describe it objectively.

That said, I did my best to make the description more clear and linked the PR that solved the bug.

The issue was that buoyancy was not reconstructed properly in the vertical. (There was also a problem with reconstructing buoyancy in the horizontal, but this only affects tilted domains.) So there was a bug and the discretization was incorrect.

glwagner avatar Jul 05 '23 15:07 glwagner

@tomchor I updated the PR description. Feel free to edit it further.

glwagner avatar Jul 05 '23 15:07 glwagner

The issue was that buoyancy was not reconstructed properly in the vertical. (There was also a problem with reconstructing buoyancy in the horizontal, but this only affects tilted domains.) So there was a bug and the discretization was incorrect.

I'm aware of that. My point was more that I don't quite remember how the issue was manifested in the dynamics (the recognition of which was what prompted us to abandon https://github.com/CliMA/Oceananigans.jl/pull/1910).

@tomchor I updated the PR description. Feel free to edit it further.

Thanks, that's a great detailed description. I only added one item to the advantages: simpler code and user interface.

tomchor avatar Jul 05 '23 17:07 tomchor

My point was more that I don't quite remember how the issue was manifested in the dynamics

That's ok. The dynamics associated with incorrect numerics are not important.

glwagner avatar Jul 05 '23 19:07 glwagner

What do you mean by cleaner code? You mean update_state!? We need all of these functions still for the hydrostatic model so I don't think on the whole there's much of a change to the source code.

glwagner avatar Jul 05 '23 19:07 glwagner

What do you mean by cleaner code? You mean update_state!? We need all of these functions still for the hydrostatic model so I don't think on the whole there's much of a change to the source code.

I'm referring to the fact that dealing with just one pressure makes for cleaner/simpler code in general. But feel free to remove that statement if you don't agree.

tomchor avatar Jul 05 '23 19:07 tomchor

Just trying to understand what was meant by that since it's being mentioned as one of the major advantages (as opposed to the many minor advantages like computational efficiency)

glwagner avatar Jul 05 '23 19:07 glwagner

I think it makes sense to merge #3188 before this PR --- if we can show that the new pressure solver in #3188 is unaffected by this PR for some examples of interest (maybe including cases dominated by hydrostatic pressure) then I think we can move forward.

glwagner avatar Aug 01 '23 17:08 glwagner

I think it makes sense to merge #3188 before this PR --- if we can show that the new pressure solver in #3188 is unaffected by this PR for some examples of interest (maybe including cases dominated by hydrostatic pressure) then I think we can move forward.

Sounds good. Do you have a sense of how close #3188 is to being ready for merging?

tomchor avatar Aug 02 '23 13:08 tomchor

I'll leave this here for the record. I'm currently experiencing the first significant dynamical difference I've seen so far between the model with and without the pressure separation.

In a simulation where I'm studying flow past an obstacle (therefore with immersed boundaries) the simulation runs fine on this branch, but (everything else being the same) doesn't finish running on main. On main the velocities keep increasing, leading to a progressive decrease in Δt to satisfy CFL condition, but the velocities keep increasing despite that, with smaller and smaller Δts. (I believe that's called a slow blow-up?)

So this is a case where the simulation fails on main, but is successful in this branch.

The simulation is far too complex to post here, but I'll try to come up with a MWE that reproduces it.

tomchor avatar Aug 17 '23 01:08 tomchor

I'll leave this here for the record. I'm currently experiencing the first significant dynamical difference I've seen so far between the model with and without the pressure separation.

In a simulation where I'm studying flow past an obstacle (therefore with immersed boundaries) the simulation runs fine on this branch, but (everything else being the same) doesn't finish running on main. On main the velocities keep increasing, leading to a progressive decrease in Δt to satisfy CFL condition, but the velocities keep increasing despite that, with smaller and smaller Δts. (I believe that's called a slow blow-up?)

So this is a case where the simulation fails on main, but is successful in this branch.

The simulation is far too complex to post here, but I'll try to come up with a MWE that reproduces it.

Interesting that is with immersed boundaries. The pressure solver isn't correct in that case so I'm not sure how to interpret that. The error should be smaller on main (because it's only the correction to the hydrostatic anomaly that is wrong on main).

The main uncertainty is how this PR will interact with #3188. We could explore using the new immersed pressure solver on this branch to test that out.

glwagner avatar Aug 17 '23 10:08 glwagner