Oceananigans.jl
Oceananigans.jl copied to clipboard
User-facing `on_grid` utility
Occasionally you'll find user scripts peppered with things like
Q = arch_array(arch, Q)
which changes Q to CuArray if needed and vice versa. Or, more recently:
https://github.com/CliMA/Oceananigans.jl/blob/5cc9653584370e7cbbd828583d4129628eb20fd0/validation/multi_region/multi_region_near_global_quarter_degree.jl#L117
on a multi-region grid, which "partitions" a global array onto difference devices.
That last pattern is also needed for distributed problems in which global-size data is either built or loaded from disk.
I propose we implement one utility for all these cases called something like on_grid(obj, grid) (note I'm reversing the argument intputs relative to arch_array; I think that's what we want, but it's something to discuss carefully. It's also a problem that multi_region_object_from_array and arch_array have different syntax).
Usually one can write generic code for CPU/GPU --- except when building boundary conditions in terms of arrays, where we do not want to automatically convert from CPU to GPU. In that case users need to write
Q = on_grid(Q, grid)
since grid has grid.architecture, this will change to CPU or GPU as needed.
For distributed problems we also want
Q = on_grid(Q, grid)
if Q is loaded from file, for example. If Q has the size of global data, we will partition it into a local version (since the grid is also local). We can "detect" whether Q has a local size (though there are some subtleties re: dimensionality...) and handle that case. We can also transfer to correct architecture.
For multi-region problems we write
Q = on_grid(Q, grid)
which will return a MultiRegionObject with Q appropriately partitioned.
I think this will help users write generic code that can run on any grid + architecture.
Other names are definitely welcome!
I agree. I would even propose a way that the user doesn't even need to care! Why do we expect the user to call Q = on_grid(Q, grid)? For example, set! could be calling that behind the scenes. Or other functions/methods wherever needed?
I guess it's mostly for boundary conditions. With boundary conditions I don't think we want to auto-partition, because very often the application of using a boundary condition array is to pass fluxes between models or from data into a model. I think the most useable / friendly solution is to throw a helpful error that tells the user to use on_grid.