Oceananigans.jl icon indicating copy to clipboard operation
Oceananigans.jl copied to clipboard

Fixing up shallow water model (cleaned up version of #3486)

Open francispoulin opened this issue 1 year ago • 44 comments

Closes https://github.com/CliMA/Oceananigans.jl/issues/3051

This revised the shallow water tendencies, adds a new valiation example with 1D topography and starts to get the global model working.

I'm happy to say that the near global simulation is now running!

francispoulin avatar Mar 07 '24 20:03 francispoulin

Nice! Post some videos

simone-silvestri avatar Mar 07 '24 21:03 simone-silvestri

@simone-silvestri Will do!

At the moment I am using momentum_advection = VectorInvariant(). If it goes unstable I will try the one you suggested on the other PR. I also tried what you suggested on the other PR,

VectorInvariant(vorticity_scheme = WENO(), kinetic_energy_gradient_scheme = WENO())

This failed because of the error below. Any idea what I need to do to fix this?

ERROR: LoadError: MethodError: no method matching _symmetric_interpolate_yᵃᶜᵃ(::Int64, ::Int64, ::Int64, ::ImmersedBoundaryGrid{…}, ::EnergyConserving{…}, ::typeof(Oceananigans.Advection.δx_v²), ::Field{…}, ::Field{…})

Closest candidates are:
  _symmetric_interpolate_yᵃᶜᵃ(::Any, ::Any, ::Any, ::ImmersedBoundaryGrid, ::Union{Centered{2}, Centered{3}, Centered{4}, Centered{5}, Centered{6}, UpwindBiased{2}, UpwindBiased{3}, UpwindBiased{4}, UpwindBiased{5}, UpwindBiased{6}, WENO}, ::Any...)
   @ Oceananigans ~/Software/Oceananigans.jl/src/ImmersedBoundaries/conditional_fluxes.jl:210
  _symmetric_interpolate_yᵃᶜᵃ(::Any, ::Any, ::Any, ::ImmersedBoundaryGrid, ::Union{Centered{1}, UpwindBiased{1}}, ::Any...)
   @ Oceananigans ~/Software/Oceananigans.jl/src/ImmersedBoundaries/conditional_fluxes.jl:207
  _symmetric_interpolate_yᵃᶜᵃ(::Any, ::Any, ::Any, ::Any, ::VectorInvariant{<:Any, <:Any, true}, ::Any, ::Any...)
   @ Oceananigans ~/Software/Oceananigans.jl/src/Advection/vector_invariant_advection.jl:250
  ...

francispoulin avatar Mar 07 '24 21:03 francispoulin

Some good news is that more tests seem to be passing compared to the prevoius PR.

One of the messages I saw was there was a cancellation signal, see below. Lots of the tests just cancelled, and I'm not sure why.

     Testing Oceananigans
# Received cancellation signal, interrupting

francispoulin avatar Mar 08 '24 13:03 francispoulin

@simone-silvestri Unfortunately, my run died just before 20 days. I guess the simple advection scheme that I was using without viscosity wasn't stable enough.

Could you help me to set up the one you suggested on the previous PR? I tried was you suggested but got an error, which is copied above.

francispoulin avatar Mar 11 '24 12:03 francispoulin

Some good news is that more tests seem to be passing compared to the prevoius PR.

One of the messages I saw was there was a cancellation signal, see below. Lots of the tests just cancelled, and I'm not sure why.

     Testing Oceananigans
# Received cancellation signal, interrupting

I don't know what you saw. But what I see here:

https://buildkite.com/clima/oceananigans/builds/14750

is that almost all tests pass.

navidcy avatar Mar 11 '24 12:03 navidcy

Some good news is that more tests seem to be passing compared to the prevoius PR. One of the messages I saw was there was a cancellation signal, see below. Lots of the tests just cancelled, and I'm not sure why.

     Testing Oceananigans
# Received cancellation signal, interrupting

I don't know what you saw. But what I see here:

https://buildkite.com/clima/oceananigans/builds/14750

is that almost all tests pass.

Thanks @navidcy , that does look very promising!

The error that I saw is copied below. I am not quite sure where this comes from though.

  [8dfed614] Test
      Status `/tmp/jl_pRHyI0/Manifest.toml`
[403513] signal (11.1): Segmentation fault
in expression starting at /storage5/buildkite-agent/.julia-14750/packages/CUDA_Runtime_jll/rcOoh/.pkg/platform_augmentation.jl:102
Allocations: 2907 (Pool: 2899; Big: 8); GC: 0
ERROR: failed process: Process(`/storage5/buildkite-agent/julia-1.10.2/bin/julia -C native -J/storage5/buildkite-agent/julia-1.10.2/lib/julia/sys.so -O0 -g1 --color=yes -O0 --color=no --history-file=no --startup-file=no --project=/tmp/jl_pRHyI0/Project.toml --eval 'append!(empty!(Base.DEPOT_PATH), ["/storage5/buildkite-agent/.julia-14750"])
append!(empty!(Base.DL_LOAD_PATH), String[])
cd("/storage5/buildkite-agent/.julia-14750/packages/CUDA_Runtime_jll/rcOoh/.pkg")
include("/storage5/buildkite-agent/.julia-14750/packages/CUDA_Runtime_jll/rcOoh/.pkg/select_artifacts.jl")
' -t1 --startup-file=no x86_64-linux-gnu-libgfortran5-cxx11-libstdcxx30-julia_version+1.10.2`, ProcessSignaled(11)) [0]
Stacktrace:
  [1] pipeline_error
    @ ./process.jl:565 [inlined]
  [2] read(cmd::Cmd)
    @ Base ./process.jl:449
  [3] collect_artifacts(pkg_root::String; platform::Base.BinaryPlatforms.Platform)
    @ Pkg.Operations /storage5/buildkite-agent/julia-1.10.2/share/julia/stdlib/v1.10/Pkg/src/Operations.jl:720
  [4] collect_artifacts
    @ /storage5/buildkite-agent/julia-1.10.2/share/julia/stdlib/v1.10/Pkg/src/Operations.jl:706 [inlined]
  [5] check_artifacts_downloaded(pkg_root::String; platform::Base.BinaryPlatforms.Platform)
    @ Pkg.Operations /storage5/buildkite-agent/julia-1.10.2/share/julia/stdlib/v1.10/Pkg/src/Operations.jl:764
  [6] check_artifacts_downloaded
    @ /storage5/buildkite-agent/julia-1.10.2/share/julia/stdlib/v1.10/Pkg/src/Operations.jl:763 [inlined]
  [7] is_package_downloaded(manifest_file::String, pkg::Pkg.Types.PackageSpec; platform::Base.BinaryPlatforms.Platform)
    @ Pkg.Operations /storage5/buildkite-agent/julia-1.10.2/share/julia/stdlib/v1.10/Pkg/src/Operations.jl:2195
  [8] is_package_downloaded
    @ /storage5/buildkite-agent/julia-1.10.2/share/julia/stdlib/v1.10/Pkg/src/Operations.jl:2190 [inlined]
  [9] print_status(env::Pkg.Types.EnvCache, old_env::Nothing, registries::Vector{Pkg.Registry.RegistryInstance}, header::Symbol, uuids::Vector{Base.UUID}, names::Vector{String}; manifest::Bool, diff::Bool, ignore_indent::Bool, outdated::Bool, extensions::Bool, io::Base.TTY, mode::Pkg.Types.PackageMode, hidden_upgrades_info::Bool, show_usagetips::Bool)
    @ Pkg.Operations /storage5/buildkite-agent/julia-1.10.2/share/julia/stdlib/v1.10/Pkg/src/Operations.jl:2308
 [10] print_status
    @ /storage5/buildkite-agent/julia-1.10.2/share/julia/stdlib/v1.10/Pkg/src/Operations.jl:2240 [inlined]
 [11] status(env::Pkg.Types.EnvCache, registries::Vector{Pkg.Registry.RegistryInstance}, pkgs::Vector{Pkg.Types.PackageSpec}; header::Nothing, mode::Pkg.Types.PackageMode, git_diff::Bool, env_diff::Nothing, ignore_indent::Bool, io::Base.TTY, outdated::Bool, extensions::Bool, hidden_upgrades_info::Bool, show_usagetips::Bool)
    @ Pkg.Operations /storage5/buildkite-agent/julia-1.10.2/share/julia/stdlib/v1.10/Pkg/src/Operations.jl:2474
 [12] status
    @ /storage5/buildkite-agent/julia-1.10.2/share/julia/stdlib/v1.10/Pkg/src/Operations.jl:2442 [inlined]
 [13] (::Pkg.Operations.var"#130#134"{Bool, Cmd, Cmd, Nothing, Pkg.Types.Context, Vector{Tuple{String, Base.Process}}, String, Pkg.Types.PackageSpec})()
    @ Pkg.Operations /storage5/buildkite-agent/julia-1.10.2/share/julia/stdlib/v1.10/Pkg/src/Operations.jl:1958
 [14] withenv(::Pkg.Operations.var"#130#134"{Bool, Cmd, Cmd, Nothing, Pkg.Types.Context, Vector{Tuple{String, Base.Process}}, String, Pkg.Types.PackageSpec}, ::Pair{String, String}, ::Vararg{Pair{String}})
    @ Base ./env.jl:257
 [15] (::Pkg.Operations.var"#117#122"{String, Bool, Bool, Bool, Pkg.Operations.var"#130#134"{Bool, Cmd, Cmd, Nothing, Pkg.Types.Context, Vector{Tuple{String, Base.Process}}, String, Pkg.Types.PackageSpec}, Pkg.Types.PackageSpec})()
    @ Pkg.Operations /storage5/buildkite-agent/julia-1.10.2/share/julia/stdlib/v1.10/Pkg/src/Operations.jl:1824
 [16] with_temp_env(fn::Pkg.Operations.var"#117#122"{String, Bool, Bool, Bool, Pkg.Operations.var"#130#134"{Bool, Cmd, Cmd, Nothing, Pkg.Types.Context, Vector{Tuple{String, Base.Process}}, String, Pkg.Types.PackageSpec}, Pkg.Types.PackageSpec}, temp_env::String)
    @ Pkg.Operations /storage5/buildkite-agent/julia-1.10.2/share/julia/stdlib/v1.10/Pkg/src/Operations.jl:1705
 [17] (::Pkg.Operations.var"#115#120"{Dict{String, Any}, Bool, Bool, Bool, Pkg.Operations.var"#130#134"{Bool, Cmd, Cmd, Nothing, Pkg.Types.Context, Vector{Tuple{String, Base.Process}}, String, Pkg.Types.PackageSpec}, Pkg.Types.Context, Pkg.Types.PackageSpec, String, Pkg.Types.Project, String})(tmp::String)
    @ Pkg.Operations /storage5/buildkite-agent/julia-1.10.2/share/julia/stdlib/v1.10/Pkg/src/Operations.jl:1794
 [18] mktempdir(fn::Pkg.Operations.var"#115#120"{Dict{String, Any}, Bool, Bool, Bool, Pkg.Operations.var"#130#134"{Bool, Cmd, Cmd, Nothing, Pkg.Types.Context, Vector{Tuple{String, Base.Process}}, String, Pkg.Types.PackageSpec}, Pkg.Types.Context, Pkg.Types.PackageSpec, String, Pkg.Types.Project, String}, parent::String; prefix::String)
    @ Base.Filesystem ./file.jl:766
 [19] mktempdir(fn::Function, parent::String)
    @ Base.Filesystem ./file.jl:762
 [20] mktempdir
    @ ./file.jl:762 [inlined]
 [21] sandbox(fn::Function, ctx::Pkg.Types.Context, target::Pkg.Types.PackageSpec, target_path::String, sandbox_path::String, sandbox_project_override::Pkg.Types.Project; preferences::Dict{String, Any}, force_latest_compatible_version::Bool, allow_earlier_backwards_compatible_versions::Bool, allow_reresolve::Bool)
    @ Pkg.Operations /storage5/buildkite-agent/julia-1.10.2/share/julia/stdlib/v1.10/Pkg/src/Operations.jl:1752
 [22] test(ctx::Pkg.Types.Context, pkgs::Vector{Pkg.Types.PackageSpec}; coverage::Bool, julia_args::Cmd, test_args::Cmd, test_fn::Nothing, force_latest_compatible_version::Bool, allow_earlier_backwards_compatible_versions::Bool, allow_reresolve::Bool)
    @ Pkg.Operations /storage5/buildkite-agent/julia-1.10.2/share/julia/stdlib/v1.10/Pkg/src/Operations.jl:1955
 [23] test
    @ /storage5/buildkite-agent/julia-1.10.2/share/julia/stdlib/v1.10/Pkg/src/Operations.jl:1899 [inlined]
 [24] test(ctx::Pkg.Types.Context, pkgs::Vector{Pkg.Types.PackageSpec}; coverage::Bool, test_fn::Nothing, julia_args::Cmd, test_args::Cmd, force_latest_compatible_version::Bool, allow_earlier_backwards_compatible_versions::Bool, allow_reresolve::Bool, kwargs::@Kwargs{io::Base.TTY})
    @ Pkg.API /storage5/buildkite-agent/julia-1.10.2/share/julia/stdlib/v1.10/Pkg/src/API.jl:444
 [25] test(pkgs::Vector{Pkg.Types.PackageSpec}; io::Base.TTY, kwargs::@Kwargs{})
    @ Pkg.API /storage5/buildkite-agent/julia-1.10.2/share/julia/stdlib/v1.10/Pkg/src/API.jl:159
 [26] test(pkgs::Vector{Pkg.Types.PackageSpec})
    @ Pkg.API /storage5/buildkite-agent/julia-1.10.2/share/julia/stdlib/v1.10/Pkg/src/API.jl:148
 [27] test(; name::Nothing, uuid::Nothing, version::Nothing, url::Nothing, rev::Nothing, path::Nothing, mode::Pkg.Types.PackageMode, subdir::Nothing, kwargs::@Kwargs{})
    @ Pkg.API /storage5/buildkite-agent/julia-1.10.2/share/julia/stdlib/v1.10/Pkg/src/API.jl:174
 [28] test()
    @ Pkg.API /storage5/buildkite-agent/julia-1.10.2/share/julia/stdlib/v1.10/Pkg/src/API.jl:165
 [29] top-level scope
    @ none:1
🚨 Error: The command exited with status 1
user command error: exit status 1

francispoulin avatar Mar 11 '24 13:03 francispoulin

Hey @francispoulin, sorry for the mistake. We probably should change the way we specify the methods for vector invariant in a vertically Flat domain because it is a bit messy. The correct way to do it would be either

using Oceananigans.Advection: OnlySelfUpwinding
VectorInvariant(vorticity_scheme = WENO(), kinetic_energy_gradient_scheme = WENO(), upwinding = OnlySelfUpwinding(; cross_scheme = WENO())

or

VectorInvariant(vorticity_scheme = WENO(), divergence_scheme = WENO())

This last one is easier and works because of how the defaults are defined. It makes sense in 3D (where you want kinetic_energy_scheme == divergence_scheme) but is misleading in the shallow water equations where we do not have a divergence in the momentum equations. Remember to pass also mass_advection = WENO()

simone-silvestri avatar Mar 11 '24 15:03 simone-silvestri

Thanks! I have tried that and it seems to run. I will restart my simulation. It's pretty slow as it did 20 days in about 1 real day but I'll see how far it gets before I need to continue it.

francispoulin avatar Mar 11 '24 15:03 francispoulin

Run on a GPU?

navidcy avatar Mar 11 '24 15:03 navidcy

Run on a GPU?

No! Thanks for asking and I realized I was testing on a CPU. Not to get it to run on a GPU.

francispoulin avatar Mar 12 '24 14:03 francispoulin

I have had some success but am still having difficulties with the height field becoming negative and then a numerical instability developing.

Of all the things I have tried so far, thanks @simone-silvestri for all the suggestions, the following scheme was able to go the furthest, just after 50 days.

momentum_advection = VectorInvariant(vorticity_scheme = WENO(), kinetic_energy_gradient_scheme = WENO(), vertical_scheme = Centered())

This is the resulting movie. Certainly things are energetic in the shallow ocean, and presumably why difficulties are developing.

https://github.com/CliMA/Oceananigans.jl/assets/8239041/63565016-72d2-4f3b-a98e-28601d6a7991

francispoulin avatar Mar 12 '24 23:03 francispoulin

Looks like there is a problem near physical (not immersed) boundaries for the height.

simone-silvestri avatar Mar 12 '24 23:03 simone-silvestri

Looks like also in the original case there was the same problem. But for some reason, the simulation was not crashing.

https://github.com/CliMA/Oceananigans.jl/assets/33547697/8a361b8c-2a31-4129-a20b-dba7bb508e2b

simone-silvestri avatar Mar 12 '24 23:03 simone-silvestri

Thanks @simone-silvestri for the observations. I will think about that tomorrow.

Today I also tried to do a flat bottom run, just to see what would happen with no bathymetry.

I added the last line below, which I thought would set the bottom of the ocean to 5km below the free-surface. Unfortunately, when I ran the simulation nothing happened after even a year.

bat = file_bathymetry["bathymetry"]
boundary = Int.(bat .> 0)
bat[ bat .> 0 ] .= 0 
bat = -bat
bat = 0*bat .- 5e3

This clearly didn't work. Any observations as to what I did wrong?

francispoulin avatar Mar 13 '24 00:03 francispoulin

I have looked at the lines of code I copied above and understand the following:

  1. Read in bathymetry from a file. These have values ranging from -10km to 0.
  2. Finds the coastlines, where the height is 0.
  3. Sets the bathymetry to zero where it is positive, but this seems to change the sign.
  4. Switches the sign of the bathymetry.

One concern that I have is that afterwards bat ranges from 0 to 10 km. Shouldn't bat be negative? @simone-silvestri?

I see what I did wrong in my attempt to define a flat bottom. When I tried to set bat equation to -5km, I see it equal to this everywhere, including the coastlines. I need to find a way to only change the depth where there is water. Hmm...

francispoulin avatar Mar 13 '24 13:03 francispoulin

I tried the simulation commenting out the line bat = - bat and that become unstable in a few hours, so clearly not correct.

I also put this line back in and forced the minimum dept to be 50 m instead of 10 m, and that became unstable after 53 days instead of 52 days. Not much of a difference.

francispoulin avatar Mar 13 '24 16:03 francispoulin

I thought I would try running the global model in the hydrostatic context with only a few points in the vertical, to see what that looks like. The only one I could find is in multiregion but that seems to require multiple GPUs.

I remember there was code back in the day that ran on one GPU. Is this code still around? If not I suppose I can modify the multi region one but I thought I would ask before I got started.

francispoulin avatar Mar 14 '24 17:03 francispoulin

btw @francispoulin, when we merge this all commits will get squashed into one

(just mentioning this in case this is why you closed #3486 in favor of this PR)

navidcy avatar Mar 17 '24 12:03 navidcy

I redid the global shallow water run with ShallowWaterScalarDiffusivity using νh = 1e+1, the same value that is used in the hydrostaticmodel for the HorizontalScalarDiffusivity. Note that the hydrostatic model also has vertical diffusivity and convective adjustment in the closure schemes. The original code had biharmonic_viscosity but I had to remove that to get it to run.

Sadly, even with closure the run still died after 54 days. Slightly longer than without closure but not a significant difference.

Here is the result of the simulation, in case anyone is curious.

https://github.com/CliMA/Oceananigans.jl/assets/8239041/525326cd-9ed3-4e7c-8c2a-510afe1cbcdc

francispoulin avatar Mar 18 '24 13:03 francispoulin

@simone-silvestri

This is a result from our updated code that shows the free-surface height restricted to 9.9 and 10.1. You can clearly see that the instabilities happend around three particular regions.

https://github.com/CliMA/Oceananigans.jl/assets/8239041/af15615a-7194-4ffc-88aa-69276b2e4c9a

francispoulin avatar Mar 19 '24 16:03 francispoulin

Here are some plots that maybe help to explain the difficulties we are having here.

First, a plot of the free-surface height and we see two regions in the Indian ocean where the height is close to zero.

η557

Second, this is a close up of the region in the centre of the Indian ocean.

η557_closeup1

Third, we see the topography for the same close up region.

bat_closeup1

Fourth, this is a zonal slice of the bathymetry, and see see that the height gets very shallow, it's 86 m below.

bat_slice_closeup1

@simone-silvestri Do we want to change the tolerance parameters of the simultion or smooth the bathymetry?

francispoulin avatar Mar 19 '24 20:03 francispoulin

I would try smoothing the bathymetry to see if the problem persists or disallow shallow regions below 100m

simone-silvestri avatar Mar 19 '24 21:03 simone-silvestri

Thanks for the suggestion @simone-silvestri .

I did the easier thing and cut off the depth at 100m. It runs until 3 days before we get blow up. The movie is pretty fun to watch and we see a lot of gravity waves being radiated in Antarctica, but the blow up seems to be south of Australia. I don't know if the gravity wave radiation happens in the hydrostatic model at all. Do you recall?

This is in the invivscid limit. I will try again with some viscosity to see what affect that has.

https://github.com/CliMA/Oceananigans.jl/assets/8239041/4c174e0b-f1b9-43b9-801b-31a591243c8d

francispoulin avatar Mar 19 '24 23:03 francispoulin

@simone-silvestri

I made the change you suggested and do not allow any bathymetry within the top 100 meters, using the following lines of code

bat = file_bathymetry["bathymetry"]
bat[bat .> -100] .= 0

When I plot a slice through the same region as before, we see that the bathymetry now reaches the surface, as you can see below.

I wonder whether having one point at the surface still causes problems? If the highest most point was lowered to the same level as the two points to the right, do you think this would be better behaved?

bat_slice_closeup1

francispoulin avatar Mar 20 '24 12:03 francispoulin

These explorations are great but if in the end we figure out that it was the bathymetry fiddling that was the issue perhaps we should think of ClimaOcean as the place to hold this validation script? It includes tools to manipulate bathymetry…

navidcy avatar Mar 20 '24 13:03 navidcy

I am sorry that I hadn't looked at ClimaOcean.jl before. I just cloned it and it looks great! Thanks for putting this together.

I am happy to have this script be hosted anywhere, after we get it working. I am going to run your quarter of a degree example and see how that runs for a year, and see whether the parameters differ at all with what we are using in ShallowWater.

One question is, do we need this global model to run before we merge this PR? I am hoping this will make the shallow water model stable, but I am curious to know what people what before that happens. A long list is better than no list at all.

francispoulin avatar Mar 20 '24 14:03 francispoulin

Yestereday, @simone-silvestri and I had a chat and discussed the current state of the shallow water model.

We found that if we ignore this PR and run the validation script on main, then it runs for 20 days before it blows up. This is different from what we saw a year or two ago, but that is the current state of affairs.

We observed that there were very strong zonal velocities, +25 m/s, that occured in the arctic and the antarctic. This is presumably what gives rise to NaN's that appear. Maybe there is a problem with the immersed boundary method, we are not sure.

We took a side step and decided to study an acquaplanet case on main. We removed the continents and made the bathymetry flat. Below are the plots of u,v,h after 59 days. Note that you can still see the continents but I think that's because there is no forcing over land.

Some good news is that we do see the ACC and it's moving in the right direction! @navidcy

We also see that the velocites are much more reasonable, around .1 m/s.

One odd bit is the height field seems to be 0 at the top and bottom.

I am including an animation of u,h in case that is of interest.

Thoughts?

u_20days

v_20days

h_20days

https://github.com/CliMA/Oceananigans.jl/assets/8239041/035bb9ed-ce4b-42f1-ac7c-c17235d2aad9

francispoulin avatar Apr 17 '24 12:04 francispoulin

How is the simulation forced? I don't understand why it blows up. Is there a violation of CFL?

glwagner avatar Apr 17 '24 17:04 glwagner

How is the simulation forced? I don't understand why it blows up. Is there a violation of CFL?

It is being forced by realistic wind stress and bathymetry. The same forcing exactly (I believe) that's used in the hydrostatic model.

The height field goes to zero and that causes the numerical instability.

We are not using the time step wizard but even though the winds are over 25 m/s, it doesn't go unstable until after 20 days. A surprise perhaps. The high velocities happen in the most polar regions, where the depths tend to be shallow.

francispoulin avatar Apr 17 '24 18:04 francispoulin

To recap:

When we have an acquaplanet with a flat bottom, the shallow water nearly global quarter of a degree model is stable for at least two years (not shown here). This is in contrast to when we included coastlines where it becomes unstable after about 20 days.

To study an intermidate situation, where we have continents but with a flat bottom, I set the depth of the ocean to be 6 km for no especially good reason. This simulation becoems unstable after 60 days. This suggests to me that the problem is with the immersed boundary, where we include the coastlines. @simone-silvestri

https://github.com/CliMA/Oceananigans.jl/assets/8239041/9ce47ed1-b684-4b59-890a-ad343d142e16

francispoulin avatar Apr 23 '24 17:04 francispoulin