cf-xarray icon indicating copy to clipboard operation
cf-xarray copied to clipboard

Add support for 'flag_masks'

Open Descanonge opened this issue 1 year ago • 3 comments

#199

Now support CF flag variables encoding for mutually exclusive flags ('flag_values' attribute), independent flags ('flag_masks' attribute) or a mix of both (both attributes present).

  • Change create_flag_dict(). The dictionnary values are tuples of both the 'flag_masks' and 'flag_values'. Either are set to None if not present in attributes.
  • Add extract_flags() to retrieve boolean masks of any type of flag. Variable name is that of the 'flag_meaning' extracted. Can extract a single flag (returns a DataArray) or multiple ones (returns a Dataset).
  • Change implementation of eq, ne, isin. All are thin wrappers around extract_flags().
  • Change repr to list flags depending on their type (mutually exclusive or independent). Do not list the corresponding value anymore (it becomes complicated when mixing both types...).

It breaks rich comparisons. I do not understand in what situation they should be used. If it is necessary to keep them, they can be easily limited to work for mutually exclusive flags only.

Tests and documentation were not changed. If the implementation proposed is accepted, I can work on it as well.

Descanonge avatar Jul 22 '22 15:07 Descanonge

Thanks @Descanonge I'll take a look soon.

dcherian avatar Jul 28 '22 15:07 dcherian

If it is necessary to keep them, they can be easily limited to work for mutually exclusive flags only.

Yes we should keep the rich comparisons. Those are the failing tests. da.cf == "Altantic Ocean" where "Atlantic Ocean" is in flag_meanings.

What do you think of adding a new flags property as discussed in #199,

Then da.cf == "Atlantic Ocean" and da.cf.flags["Atlantic Ocean"] would be identical. The latter syntax would generalize to flag masks too.

dcherian avatar Aug 09 '22 21:08 dcherian

I adapted the code for the interface proposed in the issue. The flags property returns a Dataset containing all boolean masks. For instance this allows:

da.cf.flags.low_battery & ~da.cf.flags.maintenance_mode

I have kept rich comparisons so we can still do da.cf == 'low_battery' or da.cf.isin(['low_battery']).

The dataset is lazily evaluated but not cached, so unless storing in a variable (flags = da.cf.flags) we add some overhead...

If the implementation is satisfactory I will fix the tests and docs.

Edit: Also I moved all methods in the CFDataArrayAccessor, since I figured all of this should only operate on a single flag variable.

Descanonge avatar Aug 15 '22 17:08 Descanonge

Sorry for the large delay here. This is a great contribution, and I'm excited to merge it.

I have a few requests but think this is quite close. It would also be nice to add documentation but that could be in a future PR.

dcherian avatar Dec 08 '22 03:12 dcherian

Thanks for the review, there are plenty of things to improve that's for sure. I am finishing writing my phd manuscript for the coming month so I am not super available but I'll try looking at it.

Descanonge avatar Feb 22 '23 23:02 Descanonge

Ah no worries. Good luck with finishing your PhD manuscript!

dcherian avatar Feb 23 '23 03:02 dcherian

What do you think of these reprs?

Plain text

image image image image

Rich

image image image image

dcherian avatar Feb 23 '23 04:02 dcherian

Codecov Report

Merging #354 (e99c30f) into main (fa09644) will decrease coverage by 0.18%. The diff coverage is 87.98%.

@@            Coverage Diff             @@
##             main     #354      +/-   ##
==========================================
- Coverage   86.42%   86.25%   -0.18%     
==========================================
  Files          22       22              
  Lines        3994     4148     +154     
  Branches      194      214      +20     
==========================================
+ Hits         3452     3578     +126     
- Misses        491      512      +21     
- Partials       51       58       +7     
Flag Coverage Δ
mypy 37.68% <37.07%> (+0.07%) :arrow_up:
unittests 95.99% <95.51%> (-0.35%) :arrow_down:

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
cf_xarray/formatting.py 74.17% <66.66%> (-2.38%) :arrow_down:
cf_xarray/datasets.py 75.04% <86.95%> (+0.56%) :arrow_up:
cf_xarray/accessor.py 86.45% <90.41%> (-0.42%) :arrow_down:
cf_xarray/tests/test_accessor.py 94.14% <100.00%> (+0.21%) :arrow_up:

codecov[bot] avatar May 10 '23 03:05 codecov[bot]