oceanspy icon indicating copy to clipboard operation
oceanspy copied to clipboard

Variable mapping table and the catalog file

Open MaceKuailv opened this issue 2 years ago • 4 comments

I found opening and working on new datasets that are not in the standard MITgcm output format with oceanspy somewhat difficult.

I am suggesting several functionalities that might come in handy:

  • Oceanspy should be able to guess the meaning of variables based on name and long_name.
  • Od object contains a dictionary called alases that shows what variables got mapped as what standard variable. If would be nice to print out this table when calling od. Ideally, oceanspy should be able to generate this object on its own based on the guesses. And of course, users can change this table.
  • The catalog file should contain the aliasing information. I also think it would be nice to have an alternative to the yaml format. Say, a csv file that can be edited by Excel.
  • The documentation for set_aliases and manipulate_coords is a bit hard to find. I also think they should be called in the open_oceandataset function.

MaceKuailv avatar Jun 12 '22 10:06 MaceKuailv

Thanks for the suggestions! It seems that the 2nd item is easiest to implement. Please go ahead and provide a working example. The first and third items look more involved. How do you envisage accomplishing them? Are they needed now or for future releases with support for more models? For the final item: How does this change current functionality?

ThomasHaine avatar Jun 14 '22 17:06 ThomasHaine

The hardest one to implement is probably number 3. There really are a lot of configurations to fill in when creating a new dataset, and the easiest way to do it is debatable.

No.1 can be implemented easily with some word2vec packages (it's also not hard to implement without them).

As for No.4, I think an alternative to calling the set_alias and manipulate_coords is to print out a reminder or raise a warning: "This dataset is missing XG (the longitude of U-velocity). Some methods may require these variable(s) to work. Called set_alias or manipulate_coords to fill it in. "

I would suggest those changes be made before the next release. Right now, the datasets files that have a catalog file are all renamed in the MITgcm fashion (I don't know how it was done, but I assume by hand). Although those changes are going to require somewhere between 100-500 lines of code (actually not that much), I think it is going to help broadcast the package to a much broader audience. I think it is pretty cost-effective.

MaceKuailv avatar Jun 15 '22 03:06 MaceKuailv

Inspired by this issue and #274, we should be able to add function to oceanspy that list all available datasets on Sciserver at some point.

asiddi24 avatar Nov 22 '22 20:11 asiddi24

See also @malmans2 2nd comment here: https://github.com/hainegroup/oceanspy/issues/224#issuecomment-1057806034

ThomasHaine avatar Jan 17 '23 20:01 ThomasHaine