Small improvements to NaNChecker
Inspired by a discussion with @navidcy and @taimoorsohail, this PR implements two small improvements to the NaNChecker:
- Preallocate an object to store the result of the reduction which computes whether a field has NaNs or not, which saves some memory allocation (can be important if trying to diagnose NaNs every time-step
- Return the result of NaN checking from the callback function. Useful if we don't want to add NaNChecker directly as a callback but rather call it within another callback.
I may add a few more improvements here, because more ideas are coming to mind...
bunch of tests fail, e.g.,
https://buildkite.com/clima/oceananigans/builds/23183#01969c41-533f-4867-bc39-028df30cd953/6-37
Is it because the fallback
NaNChecker(fields) = NaNChecker(fields, false) # default
was removed? Or perhaps because the fallback has an arg and now it became kwarg?
I think because somebody is using that constructor. I think its better to have just one constructor so I will try to fix the errors
@taimoorsohail, should work now and test will (hopefully) pass
Tests are still failing - having a look now
I haven't been able to figure out why these tests are failing - other than that they don't seem to be related to the NaN checker changes as far as I can tell.
This
https://buildkite.com/clima/oceananigans/builds/23277#0196b2e0-8d99-4d3d-a91d-1a4e8c0275e9/18-1245
might seem unrelated but following the error stack trace it does originate from the nanchecker
https://buildkite.com/clima/oceananigans/builds/23277#0196b2e0-8d99-4d3d-a91d-1a4e8c0275e9/18-1273
I'll have a look now.
@glwagner, seems like hasnan breaks for MultiRegion (probably also Distributed?) fields? Here's a mwe:
using Oceananigans
grid = RectilinearGrid(CPU(), size = (64, 3, 1), x = (0, 100e3), y = (0, 100e3), z = (-400, 0))
regions = 2
mrg = MultiRegionGrid(grid, partition = XPartition(regions))
model = HydrostaticFreeSurfaceModel(grid = mrg)
simulation = Simulation(model; Δt=0.1, stop_iteration = 10)
run!(simulation)
julia> run!(simulation)
[2025/05/09 12:12:48.874] INFO Initializing simulation...
ERROR: MethodError: no method matching interior(::MultiRegionObject{…}, ::Tuple{…}, ::MultiRegionGrid{…}, ::MultiRegionObject{…})
Closest candidates are:
interior(::Field, ::Any...)
@ Oceananigans ~/Library/CloudStorage/OneDrive-TheUniversityofMelbourne/Documents/Research/Oceananigans.jl-v3/src/Fields/field.jl:407
interior(::FieldTimeSeries, ::Any...)
@ Oceananigans ~/Library/CloudStorage/OneDrive-TheUniversityofMelbourne/Documents/Research/Oceananigans.jl-v3/src/OutputReaders/field_time_series.jl:724
interior(::OffsetArray, ::Any, ::Any, ::Any)
@ Oceananigans ~/Library/CloudStorage/OneDrive-TheUniversityofMelbourne/Documents/Research/Oceananigans.jl-v3/src/Fields/field.jl:406
...
Stacktrace:
[1] interior(f::Field{…})
@ Oceananigans.Fields ~/Library/CloudStorage/OneDrive-TheUniversityofMelbourne/Documents/Research/Oceananigans.jl-v3/src/Fields/field.jl:405
[2] any!(f::Function, r::Field{…}, a::Field{…}; condition::Nothing, mask::Bool, kwargs::@Kwargs{})
@ Oceananigans.Fields ~/Library/CloudStorage/OneDrive-TheUniversityofMelbourne/Documents/Research/Oceananigans.jl-v3/src/Fields/field.jl:682
[3] any!(f::Function, r::Field{…}, a::Field{…})
@ Oceananigans.Fields ~/Library/CloudStorage/OneDrive-TheUniversityofMelbourne/Documents/Research/Oceananigans.jl-v3/src/Fields/field.jl:673
[4] hasnan(field::Field{…}, checker::Oceananigans.Models.NaNChecker{…})
@ Oceananigans.Models ~/Library/CloudStorage/OneDrive-TheUniversityofMelbourne/Documents/Research/Oceananigans.jl-v3/src/Models/nan_checker.jl:40
[5] (::Oceananigans.Models.NaNChecker{…})(simulation::Simulation{…})
@ Oceananigans.Models ~/Library/CloudStorage/OneDrive-TheUniversityofMelbourne/Documents/Research/Oceananigans.jl-v3/src/Models/nan_checker.jl:49
[6] (::Callback{…})(sim::Simulation{…})
@ Oceananigans.Simulations ~/Library/CloudStorage/OneDrive-TheUniversityofMelbourne/Documents/Research/Oceananigans.jl-v3/src/Simulations/callback.jl:15
[7] initialize!(sim::Simulation{…})
@ Oceananigans.Simulations ~/Library/CloudStorage/OneDrive-TheUniversityofMelbourne/Documents/Research/Oceananigans.jl-v3/src/Simulations/run.jl:235
[8] time_step!(sim::Simulation{…})
@ Oceananigans.Simulations ~/Library/CloudStorage/OneDrive-TheUniversityofMelbourne/Documents/Research/Oceananigans.jl-v3/src/Simulations/run.jl:136
[9] run!(sim::Simulation{…}; pickup::Bool)
@ Oceananigans.Simulations ~/Library/CloudStorage/OneDrive-TheUniversityofMelbourne/Documents/Research/Oceananigans.jl-v3/src/Simulations/run.jl:105
[10] run!(sim::Simulation{…})
@ Oceananigans.Simulations ~/Library/CloudStorage/OneDrive-TheUniversityofMelbourne/Documents/Research/Oceananigans.jl-v3/src/Simulations/run.jl:92
[11] top-level scope
@ REPL[75]:1
Some type information was truncated. Use `show(err)` to see complete types.
Wondering if we need @apply_regionally at
https://github.com/CliMA/Oceananigans.jl/blob/9454ebc7637e96f152c1aa0410875c1449c333b1/src/Models/nan_checker.jl#L40
Actually on second thought I'm wondering how would this work on distributed architectures. Do we need to be calling any! eg on each MPI rank and then combine?
Wondering if we need
@apply_regionallyathttps://github.com/CliMA/Oceananigans.jl/blob/9454ebc7637e96f152c1aa0410875c1449c333b1/src/Models/nan_checker.jl#L40
shouldn't that work under the hood? same with distributed. otherwise users cannot reduce either without @apply_regionally
I think it should work after https://github.com/CliMA/Oceananigans.jl/pull/4497
Is this ready to merge once tests pass? now that #4520 is merged?
No. First #4497 needs to get in
Thanks!