cf-xarray icon indicating copy to clipboard operation
cf-xarray copied to clipboard

Guess `cell_measures`?

Open dcherian opened this issue 3 years ago • 4 comments

We could automatically assign the cell_measures attribute by looking at variables with standard_name: cell_thickness, cell_area and set these when the dimensions of the metric variable is a subset of a data variable or coordinate variable.

We could add this to a .cf.guess_cf_attributes(cell_measures=True, coord_axis=True) that will replace .cf.guess_coord_axis

Related #200

dcherian avatar Apr 13 '21 02:04 dcherian

Yes, I think this would be very useful. I do it for most of my model output as the timeless measures are usually stored in separate netcdf files.

A couple of thoughts:

  • Let's do the same for coordinates? I.e., after guessing coords assign the coordinates attribute to all data_vars?
  • Add a standard_names argument? If False, assume that all standard names are already in good shape and assign measures and coords if the other arguments are True. I.e., skip the guessing part based on regex.
  • Add regex for measures as well?
  • Consider cell measures all variables with standard name starting with cell_? This would allow for measures not covered by CF-conventions, e.g. dX/dY.
  • There are cases where multiple measures can be assigned to the same variable. E.g., free surface models often store thickness at rest and time-varying thickness. Assign the time-varying measure (measure with the greatest number of dimensions) and raise a warning?
  • Another option could be to have separate methods: .cf.guess_standard_names(verbose) and .cf.assign_measures(verbose)/.cf.assign_coordinates(verbose).

malmans2 avatar Apr 13 '21 06:04 malmans2

cc @jbusecke.

I get really torn on the cell_measures stuff. It's basically xgcm-lite :) since you can do ds.cf.weighted("area").mean() but you can't go too far because there's no convention on cell lengths. I think some of these issues would be solved by making it easier to setup an xgcm grid object.

@jbusecke, @malmans2, @aulemahal is there value in trying to set up a group call about this? I think there's an opportunity for better metadata handling around gridded, xgcm, sgrid, xesmf with cf_xarray providing a little helper function layer to consolidate the code.

Let's do the same for coordinates? I.e., after guessing coords assign the coordinates attribute to all data_vars?

I think this makes sense.

Add a standard_names argument? ... I.e., skip the guessing part based on regex.

OK

Add regex for measures as well?

Yes, I think that would be fine matching areacello, thkcello (maybe other common choices)

Consider cell measures all variables with standard name starting with cell_? This would allow for measures not covered by CF-conventions, e.g. dX/dY.

This I am unsure about. It is annoying that CF doesn't describe length measures (except for cell_thickness). I think the long-term solution is to have xgcm parse sgrid conventions https://sgrid.github.io/sgrid/ with cf_Xarray providing helper functions.

There are cases where multiple measures can be assigned to the same variable. E.g., free surface models often store thickness at rest and time-varying thickness. Assign the time-varying measure (measure with the greatest number of dimensions) and raise a warning?

This sounds OK as a heuristic.

Another option could be to have separate methods

Yeah I think the cell_measures guessing you propose is complicated enough that it should be a separate method.

dcherian avatar Apr 13 '21 16:04 dcherian

I think a call would be very beneficial to sort out which parts should live where. Getting all these possibilities sorted out between different packages would go a long way in avoiding duplications!

jbusecke avatar Apr 13 '21 17:04 jbusecke

There are cases where multiple measures can be assigned to the same variable. E.g., free surface models often store thickness at rest and time-varying thickness. Assign the time-varying measure (measure with the greatest number of dimensions) and raise a warning?

I am actually doing something very similar in my cmip6_preprocessing package. Again I'd love to get this factored out over here. For me the meeting would be most useful to actually get an overview of where each of these packages overlap.

jbusecke avatar Apr 13 '21 17:04 jbusecke