cosima-cookbook icon indicating copy to clipboard operation
cosima-cookbook copied to clipboard

Degeneracy in variable name

Open aidanheerdegen opened this issue 1 year ago • 5 comments

While looking for a mapping from variable name to long_name, standard_name and units there are some troubling inconsistencies

https://github.com/ACCESS-NRI/experiment_metadb/issues/3#issuecomment-1728884698

The variables table in the database has the following schema

CREATE TABLE variables (
        id INTEGER NOT NULL, 
        name VARCHAR NOT NULL, 
        long_name VARCHAR, 
        standard_name VARCHAR, 
        units VARCHAR, 
        PRIMARY KEY (id)
);
CREATE INDEX ix_variables_name ON variables (name);
CREATE UNIQUE INDEX ix_variables_name_long_name_units ON variables (name, long_name, units);

Arguably this should also have an index columns for model and realm in case of variable name clashes between sub-models and models. In the original conception of the database it was only storing COSIMA data, so the same model and AFAIK there were no variable name overlaps between CICE and MOM5.

However if there are any other experiment types stored in the DB it may lead to more possibility of variable name clashes.

If you look for instances of multiple variable names with different definitions there are some troubling examples

sqlite> select * from variables where name not like "%time%" and name in (select name from variables group by name having count(*) > 1);
...
802|vh|Meridional Thickness Flux||m3 s-1
161|vh|Meridional thickness flux||m3 s-1
...
932|zoo|||
515|zoo|zoo||mmol/m^3
698|zoo|zoo||none
897|zoo|zooplankton||mmol/m^3

So vh is defined with slightly different long names!? How does that happen?

There are four different distinct versions of zoo (zooplankton) variables? How does this happen?

aidanheerdegen avatar Sep 21 '23 05:09 aidanheerdegen

Here are some examples of the four different zoo variables

id path
897 /g/data/ik11/outputs/access-om2/1deg_iamip2_his/output056/ocean/oceanbgc-3d-zoo-1-yearly-mean-y_2014.nc
515 /g/data/ik11/outputs/access-om2/1deg_jra55_iaf_omip2_cycle5/output288/ocean/ocean_bgc_ann.nc
698 /g/data/ik11/outputs/access-om2-025/025deg_jra55_ryf9091_bgc/restart050/ocean/csiro_bgc.res.nc
932 /g/data/ik11/outputs/access-om2/1deg_iamip2_CMCC-ESM2ssp126/restart070/ice/csiro_bgc.res.nc

The latter two are restart files, though it's a bit odd one is in the ice subdirectory, and the other is in ocean.

The first two are a bit of a mystery. Was there a code update for the 1deg_iamip2_his experiment? Looks like it was done with a bespoke build by @hakaseh:

https://github.com/hakaseh/1deg_jra55_iaf/blob/iamip2-his/manifests/exe.yaml#L15

The query for this:

select variables.id, variables.name, experiment, root_dir, ncfile 
from experiments  
        join ncfiles on experiments.id = ncfiles.experiment_id 
        join ncvars on ncvars.ncfile_id = ncfiles.id 
        join variables on  ncvars.variable_id = variables.id 
where variables.name = 'zoo';

aidanheerdegen avatar Sep 21 '23 06:09 aidanheerdegen

@aekiss should potential temperature and conservative temperature have different variable names? Or are they the same at the surface?

792|surface_temp|Conservative temperature|sea_surface_conservative_temperature|K
1453|surface_temp|Conservative temperature||deg_C
1618|surface_temp|Potential temperature|sea_surface_temperature|degrees K

aidanheerdegen avatar Sep 21 '23 06:09 aidanheerdegen

Potential and conservative temperature are different at the surface, so yes they should have distinct names.

aekiss avatar Sep 21 '23 07:09 aekiss

Just talked to Andrew, and apparently with MOM you can choose to have potential or conservative temperature as the prognostic variable, but the actual variable name does not change, though the long name will differ.

This is unfortunate for people who want to create databases mapping variable names to long names, standard names and units.

This means such look up tables have to be experiment specific AFAICT. Doh.

aidanheerdegen avatar Sep 21 '23 07:09 aidanheerdegen

Here are some examples of the four different zoo variables

id path 897 /g/data/ik11/outputs/access-om2/1deg_iamip2_his/output056/ocean/oceanbgc-3d-zoo-1-yearly-mean-y_2014.nc 515 /g/data/ik11/outputs/access-om2/1deg_jra55_iaf_omip2_cycle5/output288/ocean/ocean_bgc_ann.nc 698 /g/data/ik11/outputs/access-om2-025/025deg_jra55_ryf9091_bgc/restart050/ocean/csiro_bgc.res.nc 932 /g/data/ik11/outputs/access-om2/1deg_iamip2_CMCC-ESM2ssp126/restart070/ice/csiro_bgc.res.nc The latter two are restart files, though it's a bit odd one is in the ice subdirectory, and the other is in ocean.

I agree that it is odd that csiro_bgc.res.nc is saved in both ice and ocean subdirectories. Only one is needed.

The first two are a bit of a mystery. Was there a code update for the 1deg_iamip2_his experiment? Looks like it was done with a bespoke build by @hakaseh:

https://github.com/hakaseh/1deg_jra55_iaf/blob/iamip2-his/manifests/exe.yaml#L15

I didn't remember changing the longnames, but looking at the commit history, it looks like they were added by @aekiss:

https://github.com/hakaseh/1deg_jra55_iaf/commit/7deb65a28d8db15fac57548b242b67ad46ab48dd

The query for this:

select variables.id, variables.name, experiment, root_dir, ncfile 
from experiments  
        join ncfiles on experiments.id = ncfiles.experiment_id 
        join ncvars on ncvars.ncfile_id = ncfiles.id 
        join variables on  ncvars.variable_id = variables.id 
where variables.name = 'zoo';

hakaseh avatar Sep 22 '23 13:09 hakaseh