Oceananigans.jl icon indicating copy to clipboard operation
Oceananigans.jl copied to clipboard

Test differentiation of Oceananigans broadcast kernel with Enzyme

Open jlk9 opened this issue 1 year ago • 12 comments

There was a bug in some recent updates to KernelAbstractions.jl that caused Enzyme to break on broadcasting arrays in Oceananigans. This PR includes a test to make sure this bug doesn't occur again.

jlk9 avatar May 10 '24 17:05 jlk9

There was a bug in some recent updates to KernelAbstractions.jl that caused Enzyme to break on broadcasting arrays in Oceananigans. This PR includes a test to make sure this bug doesn't occur again.

Interesting! I think it's ok to add a broadcasting test. But it will be confusing to future developers if the test is explained / written as somehow testing a bug in another package. If there's a bug somewhere else, we need a test in that packge (presumably that has been added).

This test also seems a little complicated. Why not just write a simple function that does a broadcast, and then try to autodiff that? Why do we need initial conditions, models, etc?

For example

function times_c!(a, b, c)
    a .= b .* c # c is a number
    return sum(a) # or whatever we gotta return
end

grid = RectilinearGrid(arch, size=(1, 1, 1), extent=(1, 1, 1))
a = CenterField(grid)
b = CenterField(grid)
c = 2
@test try 
    autodiff(times_c!, a, b, c... # or something)
    true
catch
    false
end

It's super important for tests to be as short and easy to understand as possible, because maintaining test code is one of the main bottlenecks on development.

glwagner avatar May 10 '24 17:05 glwagner

Perhaps I can chime in here to give some context.

Something in the Oceananigans/KA/Enzyme/etc setup was breaking our integration test of the advection-diffusion (https://github.com/CliMA/Oceananigans.jl/pull/3480) which was blocking us for making progress for some time.

Eventually Joe successfully minimized it down to this point as a minimal error (effectively just testing successful AD of Oceanigans.Utils.launch!, which was failing). We later determined the root cause of the issue to be a problem in KA (https://github.com/JuliaGPU/KernelAbstractions.jl/pull/476).

The purpose of this is not to specifically act as a unit test for the individual KA issue, but to be a small unit test for Oceananigans modified launching infrastructure. That way if something else comes up as a bug in a future integration test, we can quickly find the root cause without weeks of debugging from the whole integration test.

wsmoses avatar May 11 '24 23:05 wsmoses

@glwagner I've modified the PR to make clear that this is providing unit tests of the Oceananigans setting utility/broadcast functionality, at at increasingly high level

wsmoses avatar May 11 '24 23:05 wsmoses

@glwagner can you give this another round of review?

wsmoses avatar May 13 '24 18:05 wsmoses

Okay, it looks like we want to test set!(model, ...) so we need a model!

The tests are failing? Which maybe means they are working.

https://buildkite.com/clima/oceananigans/builds/15738#018f734d-a866-49c7-9114-390d6932cb34/18-355

glwagner avatar May 13 '24 20:05 glwagner

Or more specifically that we need to enfore an Enzyme version bump to 0.12.5 rather than the 0.11.20 it is on

wsmoses avatar May 13 '24 20:05 wsmoses

word

glwagner avatar May 13 '24 20:05 glwagner

@glwagner I have no idea what the agent lost CI thing is

wsmoses avatar May 14 '24 02:05 wsmoses

Enzyme + Oceananigans Initialization Broadcast Kernel: Error During Test at /var/lib/buildkite-agent/builds/tartarus-14/clima/oceananigans/test/test_enzyme.jl:59
--
  | Got exception outside of a @test
  | MethodError: no constructors have been defined for Any

🤔

https://buildkite.com/clima/oceananigans/builds/15740#018f756a-5b07-47c3-be29-9fc9b4d9d3ab/29-348

glwagner avatar May 14 '24 17:05 glwagner

Ah the classic (which this test is designed to catch). We also need to bump KA @glwagner if you would do the honors.

wsmoses avatar May 14 '24 17:05 wsmoses

Ah the classic (which this test is designed to catch). We also need to bump KA @glwagner if you would do the honors.

sure thing -- to 0.9.19?

glwagner avatar May 14 '24 18:05 glwagner

Yeah

wsmoses avatar May 14 '24 18:05 wsmoses