intake-esm icon indicating copy to clipboard operation
intake-esm copied to clipboard

Function of derived registry

Open wachsylon opened this issue 3 years ago • 2 comments

Hi,

I do not understand the design/function of the derived variable registry. I use the official release 2022.9.18.

  • it seems like I cannot search for derived variables. If I use derived_variables as search key, it says the key is not in the query model. This is already a problem because derived_variable is shown as a key of the catalog but I cannot use it, e.g. for unique. If I use variable as key, by browsing value is also not found.
  • if I subset with variable dependent attributes which reduces the resulting number of variables, the derived_variables are not reduced by dependencies on the variables. That means I do not know which variables will be in the final dataset when processing to_dataset_dict

So why are derived_variables separated from normal variables at all?

Best and thanks for some introduction, Fabi

wachsylon avatar Nov 15 '22 09:11 wachsylon

@wachsylon, the derived variable registry is an experimental feature, and i wouldn't be surprised if some cases don't work.

Do you mind sharing code/code snippets/reproducible examples you are working with?

andersy005 avatar Nov 16 '22 00:11 andersy005

I fixed some things and now unique is working. However, the search for derived_variables is not working. Adding and separating searches for variables and derived variables make things easier because otherwise it is complex to explain to users the result of the search for variable. Otherwise, if you only have the key variable, it is not obvious:

  • if the searched variable is part of a derived variable, is the derived one included in the search result?
  • if the derived variable depends on variables, are these included?

if i open a derived variable as xarray, the source variables probably have to be in the dataset. however if i am only interested in the variable, why should i get the derived ones inside the dataset?

my second point is related to this. I think the registry must be updated whenever variables are subsetted no matter what the search queue was. the registry should not contain variables that cannot be created from the subset catalog.

wachsylon avatar Nov 16 '22 14:11 wachsylon