xmitgcm
xmitgcm copied to clipboard
Errors identifying files in run directory
On the MITgcm mailing list, @mjlosch reported some problems parsing the files in his run directory
the diagnostics package produces a file cheapaml at iteration 11 and iceDiag at 12. I acknowledge that this is hard to catch in a generic reading software. But when I remove the cheapaml (and iceDiag) files and retry I still get:
OSError: Could not find the expected file prefixes ['W', 'Eta', 'ice_Tice1', 'ice_iceH', 'ice_Qice2', 'ice_snowH', 'S', 'ice_Tsrf', 'ice_Qice1', 'T', 'ice_snowAge', 'ice_fract', 'V', 'VICE', 'U', 'ice_Tice2', 'UICE'] at iternum 12. (Instead found ['ice_Tsrf', 'ice_Qice1', 'T', 'V', 'ice_Tice2', 'Eta', 'Qnet', 'ice_frwAtm', 'Qsw', 'EmPmR', 'ice_flxAtm', 'W', 'PH', 'ice_Tice1', 'ice_Qice2', 'FU', 'ice_snowH', 'VICE', 'U', 'PHL', 'ice_iceH', 'S', 'FV', 'ice_snowAge', 'ice_fract', 'UICE'])
To me this looks like the same list of files but in different order. ignore_unknown_vars=True does not work.
He then elaborated
cd MITgcm/verification
./testreport -t global_ocean.cs32x15
cd global_ocean.cs32x15/tr_run.icedyn
ipython —pylab
import xmitgcm as xm
d = xm.open_mdsdataset('./‘,geometry='curvilinear')
gives
OSError: Could not find the expected file prefixes ['ice_Tice2', 'W', 'ice_fract', 'U', 'ice_snowH',
'ice_Qice2', 'ice_Tsrf', 'ice_Qice1', 'Eta', 'ice_Tice1', 'V', 'ice_iceH', 'S', 'ice_snowAge', 'T'] at iternum 72010. (Instead found ['ice_iceH', 'ice_Tice2', 'U', 'ice_fract', 'W', 'ice_flxAtm', 'PHL', 'ice_Qice2', 'ice_frwAtm', 'V', 'ice_Tsrf', 'Eta', 'ice_Tice1', 'ice_snowH', 'ice_Qice1', 'S', 'ice_snowAge', 'T', 'PH'])
import glob
vlist = ['ice_Tice2', 'W', 'ice_fract', 'U', 'ice_snowH', 'ice_Qice2', 'ice_Tsrf', 'ice_Qice1', 'Eta', 'ice_Tice1', 'V', 'ice_iceH', 'S', 'ice_snowAge', 'T’]
for v in vlist: print(glob.glob(v+'.*.meta’))
['ice_Tice2.0000072010.meta', 'ice_Tice2.0000072000.meta']
['W.0000072010.meta', 'W.0000072000.meta']
['ice_fract.0000072000.meta', 'ice_fract.0000072010.meta']
['U.0000072010.meta', 'U.0000072000.meta']
['ice_snowH.0000072000.meta', 'ice_snowH.0000072010.meta']
['ice_Qice2.0000072010.meta', 'ice_Qice2.0000072000.meta']
['ice_Tsrf.0000072010.meta', 'ice_Tsrf.0000072000.meta']
['ice_Qice1.0000072010.meta', 'ice_Qice1.0000072000.meta']
['Eta.0000072010.meta', 'Eta.0000072000.meta']
['ice_Tice1.0000072010.meta', 'ice_Tice1.0000072000.meta']
['V.0000072010.meta', 'V.0000072000.meta']
['ice_iceH.0000072010.meta', 'ice_iceH.0000072000.meta']
['S.0000072010.meta', 'S.0000072000.meta']
['ice_snowAge.0000072010.meta', 'ice_snowAge.0000072000.meta']
['T.0000072010.meta', 'T.0000072000.meta’]
so basically all variables have the same “frequency” (a record a 72000 and at 72010), still it does not work. I tried this:
d = xm.open_mdsdataset('./',prefix = ['ice_Tice2', 'W', 'ice_fract', 'U', 'ice_snowH', 'ice_Qice2', 'ice_Tsrf', 'ice_Qice1', ‘Eta', 'ice_Tice1', 'V', 'ice_iceH', 'S', 'ice_snowAge', 'T’],geometry='curvilinear')
gives
KeyError: "Couln't find metadata for variable ice_Tice2 and `ignore_unknown_vars`==False.”
but the meta files are all there (see above)
d = xm.open_mdsdataset('./',prefix = ['ice_Tice2', 'W', 'ice_fract', 'U', 'ice_snowH', 'ice_Qice2', 'ice_Tsrf', 'ice_Qice1', 'Eta', 'ice_Tice1', 'V', 'ice_iceH', 'S', 'ice_snowAge', 'T'],ignore_unknown_vars=True,geometry='curvilinear')
works, but only U,V,W,T,S,Eta are imported and all variables starting with ‘ice_’ are apparently unknown. Why is this so?
To me this looks like the same list of files but in different order.
The order should not matter, since the two lists are converted to sets before comparison https://github.com/xgcm/xmitgcm/blob/13352a50c3a28c2fb036728a606ff2806f4bd139/xmitgcm/mds_store.py#L159-L164
We will have to get to the bottom of this...
all variables starting with ‘ice_’ are apparently unknown. Why is this so?
Are you using the diagnostics package to create these files? Or are they part of the "native" seaice output?
The variables starting with "ice_" are "native" thsice
variables. I choose the verification experiment global_ocean.cs32x15/tr_run.icedyn, because there is no output from the diagnostics package convoluting the file (as opposed to my first example of verification_other/offline_cheapaml)
Ok, so that is a standalone issue: xmitgcm cannot read the ice_*
variables because it doesn't know what they contain. The metadata has to be added manually, as done here for the KPP native output:
https://github.com/xgcm/xmitgcm/blob/master/xmitgcm/variables.py#L404
We would love to have a pull request from you that adds the necessary metadata.
We generally prefer the diagnostic output because it is "self describing"; xmitgcm parses available_diagnostics.log to determine everything it needs to know. This is clearly preferably to manually keeping track of the metadata within xmitgcm itself. But we want both possibilities to be supported.
Whatever is happening with the original error you described is trickier. I still don't understand it. Maybe @raphaeldussin can dig in when he has time.
Hi Ryan, so it's not really a bug, but a "feature", i.e. there is a list of "known" variables and most packages are not represented in this list. I could start adding "native" output variables to the list (e.g. of the seaice and thsice packages), but eventually this list would become as long or longer than the OrderedDict of state_variables. Is that what you want?
Is that what you want?
Thanks for providing the seaice stace variables in #96. That was a lot of work on your part! It definitely doesn't hurt to have this info in there. The only downside is that we / you are now responsible for maintaining it if it changes. For this reason, it is preferable to work with the diagnostics output.
I am still eager to resolve the original error related to the inconsistent parsing of the "expected file prefixes".