cosima-cookbook
cosima-cookbook copied to clipboard
Degeneracy in variable name
While looking for a mapping from variable name to long_name
, standard_name
and units
there are some troubling inconsistencies
https://github.com/ACCESS-NRI/experiment_metadb/issues/3#issuecomment-1728884698
The variables
table in the database has the following schema
CREATE TABLE variables (
id INTEGER NOT NULL,
name VARCHAR NOT NULL,
long_name VARCHAR,
standard_name VARCHAR,
units VARCHAR,
PRIMARY KEY (id)
);
CREATE INDEX ix_variables_name ON variables (name);
CREATE UNIQUE INDEX ix_variables_name_long_name_units ON variables (name, long_name, units);
Arguably this should also have an index columns for model
and realm
in case of variable name clashes between sub-models and models. In the original conception of the database it was only storing COSIMA data, so the same model and AFAIK there were no variable name overlaps between CICE and MOM5.
However if there are any other experiment types stored in the DB it may lead to more possibility of variable name clashes.
If you look for instances of multiple variable names with different definitions there are some troubling examples
sqlite> select * from variables where name not like "%time%" and name in (select name from variables group by name having count(*) > 1);
...
802|vh|Meridional Thickness Flux||m3 s-1
161|vh|Meridional thickness flux||m3 s-1
...
932|zoo|||
515|zoo|zoo||mmol/m^3
698|zoo|zoo||none
897|zoo|zooplankton||mmol/m^3
So vh
is defined with slightly different long names!? How does that happen?
There are four different distinct versions of zoo
(zooplankton) variables? How does this happen?
Here are some examples of the four different zoo
variables
id | path |
---|---|
897 | /g/data/ik11/outputs/access-om2/1deg_iamip2_his/output056/ocean/oceanbgc-3d-zoo-1-yearly-mean-y_2014.nc |
515 | /g/data/ik11/outputs/access-om2/1deg_jra55_iaf_omip2_cycle5/output288/ocean/ocean_bgc_ann.nc |
698 | /g/data/ik11/outputs/access-om2-025/025deg_jra55_ryf9091_bgc/restart050/ocean/csiro_bgc.res.nc |
932 | /g/data/ik11/outputs/access-om2/1deg_iamip2_CMCC-ESM2ssp126/restart070/ice/csiro_bgc.res.nc |
The latter two are restart files, though it's a bit odd one is in the ice
subdirectory, and the other is in ocean
.
The first two are a bit of a mystery. Was there a code update for the 1deg_iamip2_his
experiment? Looks like it was done with a bespoke build by @hakaseh:
https://github.com/hakaseh/1deg_jra55_iaf/blob/iamip2-his/manifests/exe.yaml#L15
The query for this:
select variables.id, variables.name, experiment, root_dir, ncfile
from experiments
join ncfiles on experiments.id = ncfiles.experiment_id
join ncvars on ncvars.ncfile_id = ncfiles.id
join variables on ncvars.variable_id = variables.id
where variables.name = 'zoo';
@aekiss should potential temperature and conservative temperature have different variable names? Or are they the same at the surface?
792|surface_temp|Conservative temperature|sea_surface_conservative_temperature|K
1453|surface_temp|Conservative temperature||deg_C
1618|surface_temp|Potential temperature|sea_surface_temperature|degrees K
Potential and conservative temperature are different at the surface, so yes they should have distinct names.
Just talked to Andrew, and apparently with MOM you can choose to have potential or conservative temperature as the prognostic variable, but the actual variable name does not change, though the long name will differ.
This is unfortunate for people who want to create databases mapping variable names to long names, standard names and units.
This means such look up tables have to be experiment specific AFAICT. Doh.
Here are some examples of the four different
zoo
variablesid path 897
/g/data/ik11/outputs/access-om2/1deg_iamip2_his/output056/ocean/oceanbgc-3d-zoo-1-yearly-mean-y_2014.nc
515/g/data/ik11/outputs/access-om2/1deg_jra55_iaf_omip2_cycle5/output288/ocean/ocean_bgc_ann.nc
698/g/data/ik11/outputs/access-om2-025/025deg_jra55_ryf9091_bgc/restart050/ocean/csiro_bgc.res.nc
932/g/data/ik11/outputs/access-om2/1deg_iamip2_CMCC-ESM2ssp126/restart070/ice/csiro_bgc.res.nc
The latter two are restart files, though it's a bit odd one is in theice
subdirectory, and the other is inocean
.
I agree that it is odd that csiro_bgc.res.nc
is saved in both ice
and ocean
subdirectories. Only one is needed.
The first two are a bit of a mystery. Was there a code update for the
1deg_iamip2_his
experiment? Looks like it was done with a bespoke build by @hakaseh:https://github.com/hakaseh/1deg_jra55_iaf/blob/iamip2-his/manifests/exe.yaml#L15
I didn't remember changing the longnames, but looking at the commit history, it looks like they were added by @aekiss:
https://github.com/hakaseh/1deg_jra55_iaf/commit/7deb65a28d8db15fac57548b242b67ad46ab48dd
The query for this:
select variables.id, variables.name, experiment, root_dir, ncfile from experiments join ncfiles on experiments.id = ncfiles.experiment_id join ncvars on ncvars.ncfile_id = ncfiles.id join variables on ncvars.variable_id = variables.id where variables.name = 'zoo';