uxarray icon indicating copy to clipboard operation
uxarray copied to clipboard

Support OASIS flavor of SCRIP grid files (open_multigrid)

Open nils3er opened this issue 2 months ago • 9 comments

Proposed new feature or change:

In YAC, we frequently use the OASIS flavor of SCRIP grid files, for example to write out grids for debugging purposes. It would be very useful if uxarray could read these files directly.

The main difference between the standard SCRIP format and the OASIS flavor is that an OASIS file can contain multiple grids. As a result, grid variable and dimension names are prefixed with the grid name. For example:

$ ncdump -h healpix3_grid.nc 
netcdf healpix3_grid {
dimensions:
	nv_healpix3 = 4 ;
	nc_healpix3 = 768 ;
variables:
	double healpix3.cla(nc_healpix3, nv_healpix3) ;
		healpix3.cla:units = "degree" ;
	double healpix3.clo(nc_healpix3, nv_healpix3) ;
		healpix3.clo:units = "degree" ;
	int healpix3.gid(nc_healpix3) ;
	int healpix3.vgid(nc_healpix3, nv_healpix3) ;
	int healpix3.egid(nc_healpix3, nv_healpix3) ;
	int healpix3.rnk(nc_healpix3) ;

// global attributes:
		:YAC = "3.11.0" ;
		:title = "Created by YAC" ;
		:description = "Created by YAC" ;
		:grid = "curvilinear" ;
		:timeStamp = "2025-11-07T09:34:33Z" ;
}

The format specification for the OASIS flavor of SCRIP can be found here:
https://cerfacs.fr/oa4web/oasis3-mct_5.0/oasis3mct_UserGuide/node50.html

nils3er avatar Nov 07 '25 09:11 nils3er

Thanks for raising this — the OASIS flavor of SCRIP is definitely something we can support.

Before implementing, we need to clarify one design point: how should UXarray handle files containing multiple grids?

Conceptually the read path would be:

detect format → extract prefixes → build per-grid Dataset → convert to UGrid

@erogluorhan — what API direction should we follow here? Options include allowing open_grid(path, gridname=...) or a load_all=True mode that returns a mapping of grids.

Also, @nils3er : How are data variables typically associated with these multiple grids? Do different fields correspond to different grid prefixes, or is this mainly for geometry/debugging?

Once we understand the expected use case, we can choose the right return structure.

rajeeja avatar Nov 07 '25 17:11 rajeeja

OASIS would be an exception to the strong assumption of our API about receiving a single grid file. That said, instead of exposing it into the open_grid() or open_dataset()API, the solution we used for similar kinds of exceptions would make more sense to me, something like:

ux.Grid.from_OASIS(path, gridname=..., load_all=False) ux.UxDataset.from_OASIS(path, gridname=..., load_all=False)

Thoughts?

erogluorhan avatar Nov 07 '25 17:11 erogluorhan

Thanks, Orhan — that makes sense. Keeping OASIS separate from the main open_grid() and open_dataset() APIs sounds like a cleaner approach, especially since multi-grid files break the single-grid assumption baked into UXarray’s current design.

ux.Grid.from_OASIS(path, gridname=..., load_all=False)
ux.UxDataset.from_OASIS(path, gridname=..., load_all=False)

does this signature look good to everyone, or should we consider any additional arguments or return types?

rajeeja avatar Nov 07 '25 18:11 rajeeja

we might want to generalize the open functions for multigrid data. Just don't put "OASIS" in the name. ux.Grid.from_multigrid

The coupler history files from E3SM have data from multiple grids. But the file with the grid info has just one grid.

rljacob avatar Nov 07 '25 18:11 rljacob

Good point, this would avoid hardcoding

ux.Grid.from_multigrid(path, gridname=..., load_all=False)
ux.UxDataset.from_multigrid(path, gridname=..., load_all=False)

?

we might want to generalize the open functions for multigrid data. Just don't put "OASIS" in the name. ux.Grid.from_multigrid

The coupler history files from E3SM have data from multiple grids. But the file with the grid info has just one grid.

rajeeja avatar Nov 07 '25 18:11 rajeeja

Hi Rajeev, thanks for taking up this request.

Usually the grid format is only used to provide the grids as input data for the OASIS coupler. No data fields are involved there. I have never seen a "dataset" in this format. Just grid files.

But one other point: In the context of coupling often a mask file is provided together with this grid files to select cells that are active. It would probably make sense to provide the user an option to also pass this mask file. This would effectively only do a .isel(nface=...) on the grid.

You can find a grid file with several grids and the corresponding masks files for example here: https://zenodo.org/records/5342778

nils3er avatar Nov 10 '25 12:11 nils3er

@erogluorhan moved from_ to open.., is that okay? I have a draft - didn't push it yet, how does this implementation look:

ux.open_multigrid, a function designed to handle NetCDF files containing multiple grid topologies. It returns a dictionary where keys are the grid names and values are the ux.Grid objects.

Here’s a breakdown of how to use it:

1. Load all grids from a file:

# Load all available grids
grids = ux.open_multigrid("grids.nc")

# Returns: {"bggd": <Grid>, "nogt": <Grid>, "sse2": <Grid>, "torc": <Grid>}

2. Load a specific subset of grids:

You can specify which grids you want to load using the gridnames argument.

# Load only the "torc" and "nogt" grids
grids = ux.open_multigrid("grids.nc", gridnames=["torc", "nogt"])

# Returns: {"torc": <Grid>, "nogt": <Grid>}

3. Apply masks during loading:

The mask_filename argument allows you to provide a separate NetCDF file containing masks (e.g., bggd.msk, nogt.msk). The function will automatically match and apply the correct mask to each grid it loads, including only active cells where the mask value is 1.

# Load all grids and apply their respective masks
grids = ux.open_multigrid("grids.nc", mask_filename="masks.nc")

4. Load specific grids with their masks:

You can combine gridnames and mask_filename to load only a subset of grids and apply their corresponding masks.

# Load only "torc" and "nogt", applying "torc.msk" and "nogt.msk"
grids = ux.open_multigrid("grids.nc",
                         gridnames=["torc", "nogt"],
                         mask_filename="masks.nc")

Helper Function

There is also a new helper function, ux.list_multigrid_names, which allows you to list the names of all available grids within a file without loading them into memory.

# List available grids without loading them
grid_names = ux.list_multigrid_names("grids.nc")

# Returns: ["bggd", "nogt", "sse2", "torc"]

rajeeja avatar Nov 10 '25 15:11 rajeeja

This looks good to me, given the description of use @nils3er provided.

erogluorhan avatar Nov 10 '25 16:11 erogluorhan

This looks good to me, given the description of use @nils3er provided.

From a general use of "multigrids" perspective though, there might be need for dataset opening as well, I don't think we need to prioritize that for now.

erogluorhan avatar Nov 10 '25 16:11 erogluorhan