parcels Adding real-world circulation models datasets

Adding circulation model dataset mimicking the layout of real-world hydrodynamic models

[x] Chose the correct base branch (main for v3 changes, v4-dev for v4 changes)
[ ] Fixes #
[ ] Added tests
[ ] Added documentation

Jun 27 '25 14:06 erikvansebille

Here's a screenshot of the output of a 'real' copernicusmarine dataset

import copernicusmarine
copernicusmarine.subset(
    dataset_id="cmems_mod_glo_phy-cur_anfc_0.083deg_P1D-m",
    variables=["uo", "vo"],
    minimum_longitude=-4,
    maximum_longitude=10,
    minimum_latitude=50,
    maximum_latitude=60,
    start_datetime="2023-01-01T00:00:00",
    end_datetime="2023-01-31T00:00:00",
    minimum_depth=0.49402499198913574,
    maximum_depth=453.9377136230469,
    username=<HIDDEN>,
    password=<HIDDEN>,
)
ds = xr.open_dataset("cmems_mod_glo_phy-cur_anfc_0.083deg_P1D-m_uo-vo_4.00W-10.00E_50.00N-60.00N_0.49-453.94m_2023-01-01-2023-01-31.nc")
ds

Jun 27 '25 14:06 erikvansebille

Just also uploaded two NEMO C-grid files, as taken from Mercator Ocean International. Here's a screenshot from the xarray output of the V file (the U file is similar but also has a few extra 2D fields) Screenshot 2025-07-01 at 15 20 14

Jul 01 '25 13:07 erikvansebille

Just documenting what I see from a baroclinic gyre simulation with the MITgcm, when ingesting netcdf output with Xarray and XGCM

<xarray.Dataset> Size: 50GB
Dimensions:  (Xp1: 481, Y: 480, Z: 15, X: 480, Yp1: 481, Zl: 15, T: 712)
Coordinates:
  * Xp1      (Xp1) float64 4kB 0.0 1.004e+04 2.008e+04 ... 4.81e+06 4.82e+06
  * Y        (Y) float64 4kB 5.021e+03 1.506e+04 ... 4.805e+06 4.815e+06
  * Z        (Z) float64 120B -25.0 -80.0 -145.0 ... -1.52e+03 -1.705e+03
  * X        (X) float64 4kB 5.021e+03 1.506e+04 ... 4.805e+06 4.815e+06
  * Yp1      (Yp1) float64 4kB 0.0 1.004e+04 2.008e+04 ... 4.81e+06 4.82e+06
  * Zl       (Zl) float64 120B 0.0 -50.0 -110.0 ... -1.43e+03 -1.61e+03
  * T        (T) float64 6kB 0.0 4.32e+05 8.64e+05 ... 3.067e+08 3.072e+08
    iter     (T) int32 3kB ...
Data variables:
    U        (T, Z, Y, Xp1) float32 10GB ...
    V        (T, Z, Yp1, X) float32 10GB ...
    Temp     (T, Z, Y, X) float32 10GB ...
    S        (T, Z, Y, X) float32 10GB ...
    Eta      (T, Y, X) float32 656MB ...
    W        (T, Zl, Y, X) float32 10GB ...
Attributes: (12/18)
    MITgcm_version:  checkpoint69e
    build_user:      joe
    build_host:      oram
    build_date:      Wed May 14 01:16:28 PM EDT 2025
    MITgcm_URL:      http://mitgcm.org
    MITgcm_tag_id:   
    ...              ...
    nSy:             1
    nPx:             1
    nPy:             24
    Nx:              480
    Ny:              480
    Nr:              15
<xgcm.Grid>
X Axis (periodic, boundary=None):
  * center   X --> left
  * left     Xp1 --> center
Y Axis (periodic, boundary=None):
  * center   Y --> left
  * left     Yp1 --> center
Z Axis (periodic, boundary=None):
  * center   Z --> right
  * right    Zl --> center

For reference this was generated with the following code

#!/usr/bin/env python

import xarray as xr
import xgcm


# Open your netCDF file
ds = xr.open_dataset('state_0000000000.nc')

print(ds)

# Create the xgcm Grid object (example for a Cartesian grid)
grid = xgcm.Grid(ds, coords={'X': {'center': 'X', 'left': 'Xp1'},
                             'Y': {'center': 'Y', 'left': 'Yp1'},
                             'Z': {'center': 'Z', 'right': 'Zl'}})

print(grid)

Jul 01 '25 16:07 fluidnumerics-joe

Thanks for sharing the MITgcm output, @fluidnumerics-joe.

But since we're also interested in the metadata of each variable and coordinate, could you share the actual 'state_0000000000.nc' file with me so I can copy all the metadata into the parcels/_datasets/structured/circulation_models.py file?

Jul 02 '25 06:07 erikvansebille

@erikvansebille - here's a trimmed down MITgcm dataset created with MITgcm's built-in netcdf io

This can be read in with

#!/usr/bin/env python

import xarray as xr
import xgcm


# Open your netCDF file
ds = xr.open_dataset('state_0000000000.nc')
# Create the xgcm Grid object (example for a Cartesian grid)
grid = xgcm.Grid(ds, coords={'X': {'center': 'X', 'left': 'Xp1'},
                             'Y': {'center': 'Y', 'left': 'Yp1'},
                             'Z': {'center': 'Z', 'right': 'Zl'}})

I'm creating the same output but in the native metadata format. That will arrive soon

Jul 02 '25 18:07 fluidnumerics-joe

I've now added datasets mimicking the layout of NEMO (both C- and A-grid), Hycom, ECCO4, MITgcm, FES, ERA5, CESM (aka POP), GlobCurrent and CROCO.

@VeckoTheGecko, I now realise that you also started #2014, which not only adds the datasets but also includes some other changes. Would it make sense to merge this PR with #2014 first? Or merge them both into v4-dev directly?

Jul 03 '25 08:07 erikvansebille

@VeckoTheGecko, I now realise that you also started #2014, which not only adds the datasets but also includes some other changes. Would it make sense to merge this PR with #2014 first? Or merge them both into v4-dev directly?

I don't think the dataset in that PR is that important to bring into here (this looks more fleshed out than that attempt) - but I think the tooling would be good to bring into here so I can work on that.

Jul 03 '25 11:07 VeckoTheGecko

I think the tooling would be good to bring into here so I can work on that.

Done

Jul 03 '25 14:07 VeckoTheGecko

@erikvansebille not sure if you already have - would it be worth using the new tooling (compare_datasets at parcels/_datasets/utils.py) to double check the new datasets against the reference ones you have? (in case things got lost in translation)

Jul 04 '25 08:07 VeckoTheGecko

No I haven't included any testing myself. Didn't realise you hadn't done it either. Can you add these tests (in a new PR)?

Jul 04 '25 08:07 erikvansebille

There is testing in the sense of importing the datasets (which is good, so we know that the datasets construct).

The tooling compare_datasets is just to compare the metadata of the constructed datasets against the local datasets from the providers - this wouldn't be a test that we'd put into CI but is a quick verification to check for erroneous metadata.

Jul 04 '25 08:07 VeckoTheGecko

The tooling compare_datasets is just to compare the metadata of the constructed datasets against the local datasets from the providers - this wouldn't be a test that we'd put into CI but is a quick verification to check for erroneous metadata.

Ah, now I understand what you mean. I agree it's important to verify consistency. I'll do that on Monday when I'm back at my desk

Jul 04 '25 08:07 erikvansebille

Here's the same MITgcm dataset in MDS format It can be read using xmitgcm, e.g.

import xmitgcm


# Open the baroclinic gyres dataset using xmitgcm
# The time-step for the simulation is 600 seconds
ds = xmitgcm.open_mdsdataset(data_dir="./mds/", grid_dir="./mds/", iters='all', prefix=['U','V','W','T','S'], read_grid=True, delta_t=600.0)

print(ds)

Output looks like

<xarray.Dataset> Size: 271MB
Dimensions:  (XC: 480, YC: 480, XG: 480, YG: 480, Z: 15, Zp1: 16, Zu: 15,
              Zl: 15, time: 3)
Coordinates: (12/34)
  * XC       (XC) >f4 2kB 5.021e+03 1.506e+04 2.51e+04 ... 4.805e+06 4.815e+06
  * YC       (YC) >f4 2kB 5.021e+03 1.506e+04 2.51e+04 ... 4.805e+06 4.815e+06
  * XG       (XG) >f4 2kB 0.0 1.004e+04 2.008e+04 ... 4.79e+06 4.8e+06 4.81e+06
  * YG       (YG) >f4 2kB 0.0 1.004e+04 2.008e+04 ... 4.79e+06 4.8e+06 4.81e+06
  * Z        (Z) >f4 60B -25.0 -80.0 -145.0 ... -1.345e+03 -1.52e+03 -1.705e+03
  * Zp1      (Zp1) >f4 64B 0.0 -50.0 -110.0 ... -1.43e+03 -1.61e+03 -1.8e+03
    ...       ...
    rhoRef   (Z) >f4 60B dask.array<chunksize=(15,), meta=np.ndarray>
    dyF      (YC, XC) >f4 922kB dask.array<chunksize=(480, 480), meta=np.ndarray>
    dxF      (YC, XC) >f4 922kB dask.array<chunksize=(480, 480), meta=np.ndarray>
    dyU      (YG, XG) >f4 922kB dask.array<chunksize=(480, 480), meta=np.ndarray>
    iter     (time) int64 24B dask.array<chunksize=(1,), meta=np.ndarray>
  * time     (time) timedelta64[ns] 24B 0 days 5 days 10 days
Data variables:
    W        (time, Zl, YC, XC) float32 41MB dask.array<chunksize=(1, 15, 480, 480), meta=np.ndarray>
    U        (time, Z, YC, XG) float32 41MB dask.array<chunksize=(1, 15, 480, 480), meta=np.ndarray>
    S        (time, Z, YC, XC) float32 41MB dask.array<chunksize=(1, 15, 480, 480), meta=np.ndarray>
    T        (time, Z, YC, XC) float32 41MB dask.array<chunksize=(1, 15, 480, 480), meta=np.ndarray>
    V        (time, Z, YG, XC) float32 41MB dask.array<chunksize=(1, 15, 480, 480), meta=np.ndarray>
Attributes:
    Conventions:  CF-1.6
    title:        netCDF wrapper of MITgcm MDS binary data
    source:       MITgcm
    history:      Created by calling `open_mdsdataset(data_dir='./mds/', grid...

Note the difference in the coordinates. Additionally, notice that XG (f-points) and XC (tracer points) have the same length, which is different than using the NetCDF package in MITgcm. I thought we were going to wait to merge this until this other format was provided ?

Jul 04 '25 11:07 fluidnumerics-joe