Oceananigans.jl icon indicating copy to clipboard operation
Oceananigans.jl copied to clipboard

User-facing `on_grid` utility

Open glwagner opened this issue 3 years ago • 2 comments

Occasionally you'll find user scripts peppered with things like

Q = arch_array(arch, Q)

which changes Q to CuArray if needed and vice versa. Or, more recently:

https://github.com/CliMA/Oceananigans.jl/blob/5cc9653584370e7cbbd828583d4129628eb20fd0/validation/multi_region/multi_region_near_global_quarter_degree.jl#L117

on a multi-region grid, which "partitions" a global array onto difference devices.

That last pattern is also needed for distributed problems in which global-size data is either built or loaded from disk.

I propose we implement one utility for all these cases called something like on_grid(obj, grid) (note I'm reversing the argument intputs relative to arch_array; I think that's what we want, but it's something to discuss carefully. It's also a problem that multi_region_object_from_array and arch_array have different syntax).

Usually one can write generic code for CPU/GPU --- except when building boundary conditions in terms of arrays, where we do not want to automatically convert from CPU to GPU. In that case users need to write

Q = on_grid(Q, grid)

since grid has grid.architecture, this will change to CPU or GPU as needed.

For distributed problems we also want

Q = on_grid(Q, grid)

if Q is loaded from file, for example. If Q has the size of global data, we will partition it into a local version (since the grid is also local). We can "detect" whether Q has a local size (though there are some subtleties re: dimensionality...) and handle that case. We can also transfer to correct architecture.

For multi-region problems we write

Q = on_grid(Q, grid)

which will return a MultiRegionObject with Q appropriately partitioned.

I think this will help users write generic code that can run on any grid + architecture.

Other names are definitely welcome!

glwagner avatar May 07 '22 16:05 glwagner

I agree. I would even propose a way that the user doesn't even need to care! Why do we expect the user to call Q = on_grid(Q, grid)? For example, set! could be calling that behind the scenes. Or other functions/methods wherever needed?

navidcy avatar May 07 '22 21:05 navidcy

I guess it's mostly for boundary conditions. With boundary conditions I don't think we want to auto-partition, because very often the application of using a boundary condition array is to pass fluxes between models or from data into a model. I think the most useable / friendly solution is to throw a helpful error that tells the user to use on_grid.

glwagner avatar May 07 '22 21:05 glwagner