Adding real-world circulation models datasets
Adding circulation model dataset mimicking the layout of real-world hydrodynamic models
- [x] Chose the correct base branch (
mainfor v3 changes,v4-devfor v4 changes) - [ ] Fixes #
- [ ] Added tests
- [ ] Added documentation
Here's a screenshot of the output of a 'real' copernicusmarine dataset
import copernicusmarine
copernicusmarine.subset(
dataset_id="cmems_mod_glo_phy-cur_anfc_0.083deg_P1D-m",
variables=["uo", "vo"],
minimum_longitude=-4,
maximum_longitude=10,
minimum_latitude=50,
maximum_latitude=60,
start_datetime="2023-01-01T00:00:00",
end_datetime="2023-01-31T00:00:00",
minimum_depth=0.49402499198913574,
maximum_depth=453.9377136230469,
username=<HIDDEN>,
password=<HIDDEN>,
)
ds = xr.open_dataset("cmems_mod_glo_phy-cur_anfc_0.083deg_P1D-m_uo-vo_4.00W-10.00E_50.00N-60.00N_0.49-453.94m_2023-01-01-2023-01-31.nc")
ds
Just also uploaded two NEMO C-grid files, as taken from Mercator Ocean International. Here's a screenshot from the xarray output of the V file (the U file is similar but also has a few extra 2D fields)
Just documenting what I see from a baroclinic gyre simulation with the MITgcm, when ingesting netcdf output with Xarray and XGCM
<xarray.Dataset> Size: 50GB
Dimensions: (Xp1: 481, Y: 480, Z: 15, X: 480, Yp1: 481, Zl: 15, T: 712)
Coordinates:
* Xp1 (Xp1) float64 4kB 0.0 1.004e+04 2.008e+04 ... 4.81e+06 4.82e+06
* Y (Y) float64 4kB 5.021e+03 1.506e+04 ... 4.805e+06 4.815e+06
* Z (Z) float64 120B -25.0 -80.0 -145.0 ... -1.52e+03 -1.705e+03
* X (X) float64 4kB 5.021e+03 1.506e+04 ... 4.805e+06 4.815e+06
* Yp1 (Yp1) float64 4kB 0.0 1.004e+04 2.008e+04 ... 4.81e+06 4.82e+06
* Zl (Zl) float64 120B 0.0 -50.0 -110.0 ... -1.43e+03 -1.61e+03
* T (T) float64 6kB 0.0 4.32e+05 8.64e+05 ... 3.067e+08 3.072e+08
iter (T) int32 3kB ...
Data variables:
U (T, Z, Y, Xp1) float32 10GB ...
V (T, Z, Yp1, X) float32 10GB ...
Temp (T, Z, Y, X) float32 10GB ...
S (T, Z, Y, X) float32 10GB ...
Eta (T, Y, X) float32 656MB ...
W (T, Zl, Y, X) float32 10GB ...
Attributes: (12/18)
MITgcm_version: checkpoint69e
build_user: joe
build_host: oram
build_date: Wed May 14 01:16:28 PM EDT 2025
MITgcm_URL: http://mitgcm.org
MITgcm_tag_id:
... ...
nSy: 1
nPx: 1
nPy: 24
Nx: 480
Ny: 480
Nr: 15
<xgcm.Grid>
X Axis (periodic, boundary=None):
* center X --> left
* left Xp1 --> center
Y Axis (periodic, boundary=None):
* center Y --> left
* left Yp1 --> center
Z Axis (periodic, boundary=None):
* center Z --> right
* right Zl --> center
For reference this was generated with the following code
#!/usr/bin/env python
import xarray as xr
import xgcm
# Open your netCDF file
ds = xr.open_dataset('state_0000000000.nc')
print(ds)
# Create the xgcm Grid object (example for a Cartesian grid)
grid = xgcm.Grid(ds, coords={'X': {'center': 'X', 'left': 'Xp1'},
'Y': {'center': 'Y', 'left': 'Yp1'},
'Z': {'center': 'Z', 'right': 'Zl'}})
print(grid)
Thanks for sharing the MITgcm output, @fluidnumerics-joe.
But since we're also interested in the metadata of each variable and coordinate, could you share the actual 'state_0000000000.nc' file with me so I can copy all the metadata into the parcels/_datasets/structured/circulation_models.py file?
@erikvansebille - here's a trimmed down MITgcm dataset created with MITgcm's built-in netcdf io
This can be read in with
#!/usr/bin/env python
import xarray as xr
import xgcm
# Open your netCDF file
ds = xr.open_dataset('state_0000000000.nc')
# Create the xgcm Grid object (example for a Cartesian grid)
grid = xgcm.Grid(ds, coords={'X': {'center': 'X', 'left': 'Xp1'},
'Y': {'center': 'Y', 'left': 'Yp1'},
'Z': {'center': 'Z', 'right': 'Zl'}})
I'm creating the same output but in the native metadata format. That will arrive soon
I've now added datasets mimicking the layout of NEMO (both C- and A-grid), Hycom, ECCO4, MITgcm, FES, ERA5, CESM (aka POP), GlobCurrent and CROCO.
@VeckoTheGecko, I now realise that you also started #2014, which not only adds the datasets but also includes some other changes. Would it make sense to merge this PR with #2014 first? Or merge them both into v4-dev directly?
@VeckoTheGecko, I now realise that you also started #2014, which not only adds the datasets but also includes some other changes. Would it make sense to merge this PR with #2014 first? Or merge them both into v4-dev directly?
I don't think the dataset in that PR is that important to bring into here (this looks more fleshed out than that attempt) - but I think the tooling would be good to bring into here so I can work on that.
I think the tooling would be good to bring into here so I can work on that.
Done
@erikvansebille not sure if you already have - would it be worth using the new tooling (compare_datasets at parcels/_datasets/utils.py) to double check the new datasets against the reference ones you have? (in case things got lost in translation)
No I haven't included any testing myself. Didn't realise you hadn't done it either. Can you add these tests (in a new PR)?
There is testing in the sense of importing the datasets (which is good, so we know that the datasets construct).
The tooling compare_datasets is just to compare the metadata of the constructed datasets against the local datasets from the providers - this wouldn't be a test that we'd put into CI but is a quick verification to check for erroneous metadata.
The tooling
compare_datasetsis just to compare the metadata of the constructed datasets against the local datasets from the providers - this wouldn't be a test that we'd put into CI but is a quick verification to check for erroneous metadata.
Ah, now I understand what you mean. I agree it's important to verify consistency. I'll do that on Monday when I'm back at my desk
Here's the same MITgcm dataset in MDS format It can be read using xmitgcm, e.g.
import xmitgcm
# Open the baroclinic gyres dataset using xmitgcm
# The time-step for the simulation is 600 seconds
ds = xmitgcm.open_mdsdataset(data_dir="./mds/", grid_dir="./mds/", iters='all', prefix=['U','V','W','T','S'], read_grid=True, delta_t=600.0)
print(ds)
Output looks like
<xarray.Dataset> Size: 271MB
Dimensions: (XC: 480, YC: 480, XG: 480, YG: 480, Z: 15, Zp1: 16, Zu: 15,
Zl: 15, time: 3)
Coordinates: (12/34)
* XC (XC) >f4 2kB 5.021e+03 1.506e+04 2.51e+04 ... 4.805e+06 4.815e+06
* YC (YC) >f4 2kB 5.021e+03 1.506e+04 2.51e+04 ... 4.805e+06 4.815e+06
* XG (XG) >f4 2kB 0.0 1.004e+04 2.008e+04 ... 4.79e+06 4.8e+06 4.81e+06
* YG (YG) >f4 2kB 0.0 1.004e+04 2.008e+04 ... 4.79e+06 4.8e+06 4.81e+06
* Z (Z) >f4 60B -25.0 -80.0 -145.0 ... -1.345e+03 -1.52e+03 -1.705e+03
* Zp1 (Zp1) >f4 64B 0.0 -50.0 -110.0 ... -1.43e+03 -1.61e+03 -1.8e+03
... ...
rhoRef (Z) >f4 60B dask.array<chunksize=(15,), meta=np.ndarray>
dyF (YC, XC) >f4 922kB dask.array<chunksize=(480, 480), meta=np.ndarray>
dxF (YC, XC) >f4 922kB dask.array<chunksize=(480, 480), meta=np.ndarray>
dyU (YG, XG) >f4 922kB dask.array<chunksize=(480, 480), meta=np.ndarray>
iter (time) int64 24B dask.array<chunksize=(1,), meta=np.ndarray>
* time (time) timedelta64[ns] 24B 0 days 5 days 10 days
Data variables:
W (time, Zl, YC, XC) float32 41MB dask.array<chunksize=(1, 15, 480, 480), meta=np.ndarray>
U (time, Z, YC, XG) float32 41MB dask.array<chunksize=(1, 15, 480, 480), meta=np.ndarray>
S (time, Z, YC, XC) float32 41MB dask.array<chunksize=(1, 15, 480, 480), meta=np.ndarray>
T (time, Z, YC, XC) float32 41MB dask.array<chunksize=(1, 15, 480, 480), meta=np.ndarray>
V (time, Z, YG, XC) float32 41MB dask.array<chunksize=(1, 15, 480, 480), meta=np.ndarray>
Attributes:
Conventions: CF-1.6
title: netCDF wrapper of MITgcm MDS binary data
source: MITgcm
history: Created by calling `open_mdsdataset(data_dir='./mds/', grid...
Note the difference in the coordinates. Additionally, notice that XG (f-points) and XC (tracer points) have the same length, which is different than using the NetCDF package in MITgcm. I thought we were going to wait to merge this until this other format was provided ?