Enzyme with Oceananigans simulation produces a stack overflow
Running dynamical_core/autodiff_double_gure.jl in this branch:
https://github.com/DJ4Earth/Enzymanigans.jl/tree/jlk9/reduce-for-stack-overflow-0318
produces LoadError: StackOverflowError
Using the current versions of Oceananigans.jl, Enzyme.jl, ClimaOcean.jl, and KernelAbstractions.jl. Reverting to Oceananigans 0.95.13 and current Enzyme allows it to run - the stack overflow began with 95.14.
@wsmoses
Full error message:
ERROR: LoadError: StackOverflowError:
in expression starting at /Users/jkump/Desktop/Enzymanigans.jl/dynamical_core/autodiff_double_gyre.jl:86
is there more of an error message, or is that it?
This is the entire error message.
line 86
dedν = autodiff(set_runtime_activity(Enzyme.Reverse),
estimate_tracer_error, Active,
Duplicated(simulation, dsim),
Duplicated(Tᵢ, dTᵢ),
Duplicated(Sᵢ, dSᵢ),
Duplicated(J, dJ),
Duplicated(mld, dmld))
what if you dono't compute mld?
Trying that. The old stable_diffusion script still works with current Oceananigans and Enzyme, so I'm also running versions of the script without run!(simulation) since stable_diffusion never had that. I removed the momentum and tracer advection and am still getting the stack overflow so those aren't the causes.
Make sure to test with julia -O0
... still running test with compute!(mld) removed.
But replacing run!(simulation) with a loop of time_step!(simulation.model, 20minutes; euler=true) allowed the reduced script to complete without any error. Now I'm testing the unreduced autodiff_double_gyre.jl with this change to make sure.
I suggest adding a test to Oceananigans for Simulation if we want to use it
That said, I don't think we need Simulation for this work , at least not yet.
I suggest adding a test to Oceananigans for
Simulationif we want to use it
Although it wasn't breaking here, I also want to add test coverage for momentum advection and buoyancy. I'll make PRs for those today.
That said, I don't think we need
Simulationfor this work , at least not yet.
Yes, we can work around it for now.