Support OASIS flavor of SCRIP grid files (open_multigrid)
Proposed new feature or change:
In YAC, we frequently use the OASIS flavor of SCRIP grid files, for example to write out grids for debugging purposes. It would be very useful if uxarray could read these files directly.
The main difference between the standard SCRIP format and the OASIS flavor is that an OASIS file can contain multiple grids. As a result, grid variable and dimension names are prefixed with the grid name. For example:
$ ncdump -h healpix3_grid.nc
netcdf healpix3_grid {
dimensions:
nv_healpix3 = 4 ;
nc_healpix3 = 768 ;
variables:
double healpix3.cla(nc_healpix3, nv_healpix3) ;
healpix3.cla:units = "degree" ;
double healpix3.clo(nc_healpix3, nv_healpix3) ;
healpix3.clo:units = "degree" ;
int healpix3.gid(nc_healpix3) ;
int healpix3.vgid(nc_healpix3, nv_healpix3) ;
int healpix3.egid(nc_healpix3, nv_healpix3) ;
int healpix3.rnk(nc_healpix3) ;
// global attributes:
:YAC = "3.11.0" ;
:title = "Created by YAC" ;
:description = "Created by YAC" ;
:grid = "curvilinear" ;
:timeStamp = "2025-11-07T09:34:33Z" ;
}
The format specification for the OASIS flavor of SCRIP can be found here:
https://cerfacs.fr/oa4web/oasis3-mct_5.0/oasis3mct_UserGuide/node50.html
Thanks for raising this — the OASIS flavor of SCRIP is definitely something we can support.
Before implementing, we need to clarify one design point: how should UXarray handle files containing multiple grids?
Conceptually the read path would be:
detect format → extract prefixes → build per-grid Dataset → convert to UGrid
@erogluorhan — what API direction should we follow here? Options include allowing open_grid(path, gridname=...) or a load_all=True mode that returns a mapping of grids.
Also, @nils3er : How are data variables typically associated with these multiple grids? Do different fields correspond to different grid prefixes, or is this mainly for geometry/debugging?
Once we understand the expected use case, we can choose the right return structure.
OASIS would be an exception to the strong assumption of our API about receiving a single grid file. That said, instead of exposing it into the open_grid() or open_dataset()API, the solution we used for similar kinds of exceptions would make more sense to me, something like:
ux.Grid.from_OASIS(path, gridname=..., load_all=False)
ux.UxDataset.from_OASIS(path, gridname=..., load_all=False)
Thoughts?
Thanks, Orhan — that makes sense. Keeping OASIS separate from the main open_grid() and open_dataset() APIs sounds like a cleaner approach, especially since multi-grid files break the single-grid assumption baked into UXarray’s current design.
ux.Grid.from_OASIS(path, gridname=..., load_all=False)
ux.UxDataset.from_OASIS(path, gridname=..., load_all=False)
does this signature look good to everyone, or should we consider any additional arguments or return types?
we might want to generalize the open functions for multigrid data. Just don't put "OASIS" in the name. ux.Grid.from_multigrid
The coupler history files from E3SM have data from multiple grids. But the file with the grid info has just one grid.
Good point, this would avoid hardcoding
ux.Grid.from_multigrid(path, gridname=..., load_all=False)
ux.UxDataset.from_multigrid(path, gridname=..., load_all=False)
?
we might want to generalize the open functions for multigrid data. Just don't put "OASIS" in the name. ux.Grid.from_multigrid
The coupler history files from E3SM have data from multiple grids. But the file with the grid info has just one grid.
Hi Rajeev, thanks for taking up this request.
Usually the grid format is only used to provide the grids as input data for the OASIS coupler. No data fields are involved there. I have never seen a "dataset" in this format. Just grid files.
But one other point: In the context of coupling often a mask file is provided together with this grid files to select cells that are active. It would probably make sense to provide the user an option to also pass this mask file. This would effectively only do a .isel(nface=...) on the grid.
You can find a grid file with several grids and the corresponding masks files for example here: https://zenodo.org/records/5342778
@erogluorhan moved from_ to open.., is that okay?
I have a draft - didn't push it yet, how does this implementation look:
ux.open_multigrid, a function designed to handle NetCDF files containing multiple grid topologies. It returns a dictionary where keys are the grid names and values are the ux.Grid objects.
Here’s a breakdown of how to use it:
1. Load all grids from a file:
# Load all available grids
grids = ux.open_multigrid("grids.nc")
# Returns: {"bggd": <Grid>, "nogt": <Grid>, "sse2": <Grid>, "torc": <Grid>}
2. Load a specific subset of grids:
You can specify which grids you want to load using the gridnames argument.
# Load only the "torc" and "nogt" grids
grids = ux.open_multigrid("grids.nc", gridnames=["torc", "nogt"])
# Returns: {"torc": <Grid>, "nogt": <Grid>}
3. Apply masks during loading:
The mask_filename argument allows you to provide a separate NetCDF file containing masks (e.g., bggd.msk, nogt.msk). The function will automatically match and apply the correct mask to each grid it loads, including only active cells where the mask value is 1.
# Load all grids and apply their respective masks
grids = ux.open_multigrid("grids.nc", mask_filename="masks.nc")
4. Load specific grids with their masks:
You can combine gridnames and mask_filename to load only a subset of grids and apply their corresponding masks.
# Load only "torc" and "nogt", applying "torc.msk" and "nogt.msk"
grids = ux.open_multigrid("grids.nc",
gridnames=["torc", "nogt"],
mask_filename="masks.nc")
Helper Function
There is also a new helper function, ux.list_multigrid_names, which allows you to list the names of all available grids within a file without loading them into memory.
# List available grids without loading them
grid_names = ux.list_multigrid_names("grids.nc")
# Returns: ["bggd", "nogt", "sse2", "torc"]
This looks good to me, given the description of use @nils3er provided.
This looks good to me, given the description of use @nils3er provided.
From a general use of "multigrids" perspective though, there might be need for dataset opening as well, I don't think we need to prioritize that for now.