Enzyme.jl icon indicating copy to clipboard operation
Enzyme.jl copied to clipboard

Enzyme with Oceananigans simulation produces a stack overflow

Open jlk9 opened this issue 9 months ago • 11 comments

Running dynamical_core/autodiff_double_gure.jl in this branch:

https://github.com/DJ4Earth/Enzymanigans.jl/tree/jlk9/reduce-for-stack-overflow-0318

produces LoadError: StackOverflowError

Using the current versions of Oceananigans.jl, Enzyme.jl, ClimaOcean.jl, and KernelAbstractions.jl. Reverting to Oceananigans 0.95.13 and current Enzyme allows it to run - the stack overflow began with 95.14.

@wsmoses

jlk9 avatar Mar 18 '25 15:03 jlk9

Full error message:

ERROR: LoadError: StackOverflowError:
in expression starting at /Users/jkump/Desktop/Enzymanigans.jl/dynamical_core/autodiff_double_gyre.jl:86

jlk9 avatar Mar 18 '25 15:03 jlk9

is there more of an error message, or is that it?

wsmoses avatar Mar 18 '25 15:03 wsmoses

This is the entire error message.

jlk9 avatar Mar 18 '25 16:03 jlk9

line 86

dedν = autodiff(set_runtime_activity(Enzyme.Reverse),
                estimate_tracer_error, Active,
                Duplicated(simulation, dsim),
                Duplicated(Tᵢ, dTᵢ),
                Duplicated(Sᵢ, dSᵢ),
                Duplicated(J, dJ),
                Duplicated(mld, dmld))

glwagner avatar Mar 18 '25 16:03 glwagner

what if you dono't compute mld?

glwagner avatar Mar 18 '25 16:03 glwagner

Trying that. The old stable_diffusion script still works with current Oceananigans and Enzyme, so I'm also running versions of the script without run!(simulation) since stable_diffusion never had that. I removed the momentum and tracer advection and am still getting the stack overflow so those aren't the causes.

jlk9 avatar Mar 18 '25 16:03 jlk9

Make sure to test with julia -O0

glwagner avatar Mar 18 '25 16:03 glwagner

... still running test with compute!(mld) removed. But replacing run!(simulation) with a loop of time_step!(simulation.model, 20minutes; euler=true) allowed the reduced script to complete without any error. Now I'm testing the unreduced autodiff_double_gyre.jl with this change to make sure.

jlk9 avatar Mar 18 '25 17:03 jlk9

I suggest adding a test to Oceananigans for Simulation if we want to use it

glwagner avatar Mar 18 '25 17:03 glwagner

That said, I don't think we need Simulation for this work , at least not yet.

glwagner avatar Mar 18 '25 18:03 glwagner

I suggest adding a test to Oceananigans for Simulation if we want to use it

Although it wasn't breaking here, I also want to add test coverage for momentum advection and buoyancy. I'll make PRs for those today.

That said, I don't think we need Simulation for this work , at least not yet.

Yes, we can work around it for now.

jlk9 avatar Mar 18 '25 18:03 jlk9