Oceananigans.jl icon indicating copy to clipboard operation
Oceananigans.jl copied to clipboard

Optimizes fill_halo_regions_open.jl

Open jagoosw opened this issue 1 year ago • 6 comments

I was doing some profiling on a model with no open boundaries and discovered that this function was causing a big slow down. I guess this is because the compiler isn't managing to work out its just a load of nothing operations but this change appears to make it completely go away.

jagoosw avatar Sep 25 '24 21:09 jagoosw

what was the slowdown?

simone-silvestri avatar Sep 25 '24 22:09 simone-silvestri

Might be good to profile with fill_open_boundary_regions!(fields::NTuple, boundary_conditions, indices, loc, grid, args...; kwargs...) = nothing also to check this

glwagner avatar Sep 26 '24 01:09 glwagner

The profiles should be documented here. I'll tentatively approve but the documentation is important

glwagner avatar Sep 26 '24 01:09 glwagner

Ah so I've realised this isn't the fix we needed, and I was just hiding it from myself in the profile because I replaced the function by writing it in the REPL. I made an MWE:

using Oceananigans

grid = RectilinearGrid(GPU(), topology = (Flat, Flat, Bounded), size = (100, ), extent = (400, ))

model = HydrostaticFreeSurfaceModel(; grid, velocities = PrescribedVelocityFields(), momentum_advection=nothing, buoyancy=nothing, tracers = ntuple(n->Symbol(:T, n), Val(30)))
Screenshot 2024-09-26 at 12 00 29 You can see from this profile that `fill_open_boundary_regions!` takes a lot longer than `fill_halo_event!`, even though there are no velocity open boundaries. This is because it is launching a load of zero size kernels where as `fill_halo_event!` just returns nothing instead.

I've fixed this now and get this from the profile instead: Screenshot 2024-09-26 at 12 02 11

In numbers, the original version benchmarks time_step! at around 4.074 ms ± 581.472 μs and the new version 2.438 ms ± 501.642 μs

jagoosw avatar Sep 26 '24 10:09 jagoosw

I think it's good to go.

The only thing that doesn't quite make sense to me is why

fill_size = fill_halo_size(field, regular_fill_function, indices, boundary_conditions, loc, grid)

depends on regular_fill_function, since

fill_function, regular_fill_function = get_open_halo_filling_functions(loc) 

and loc is an argument to both functions. It doesn't seem that from a purely logical point of view we need regular_fill_function at all here.

glwagner avatar Sep 27 '24 13:09 glwagner

I'll stream line those two line before I merge

jagoosw avatar Sep 30 '24 09:09 jagoosw

Is this PR close to merging?

simone-silvestri avatar Nov 11 '24 10:11 simone-silvestri

cc @ali-ramadhan

tomchor avatar Nov 14 '24 09:11 tomchor

I think this has been superseded by #3834. @jagoosw let us know if it is the case or not.

simone-silvestri avatar Nov 14 '24 10:11 simone-silvestri

I think this has been superseded by #3834. @jagoosw let us know if it is the case or not.

Yeah definitely superseded!

jagoosw avatar Nov 14 '24 10:11 jagoosw