cf-xarray
cf-xarray copied to clipboard
Support `flag_masks`
From http://cfconventions.org/Data/cf-conventions/cf-conventions-1.8/cf-conventions.html#flags
int basin(lat, lon);
standard_name: region;
flag_values: 1, 2, 3;
flag_meanings:"atlantic_arctic_ocean indo_pacific_ocean global_ocean";
data:
basin: 1, 1, 1, 1, 2, ..... ;
It'd be nice to provide .cf.__eq__ and allow
ds.where(basin.cf == "atlantic_arctic_ocean") # becomes ds.where(basin == 1)
or ds.where(basin.cf.flag == "atlantic_arctic_ocean"))?
We could do something similar for .isin too
I don't understand flag_masks but we could probably do something similar
byte sensor_status_qc(time, depth, lat, lon) ;
sensor_status_qc:long_name = "Sensor Status" ;
sensor_status_qc:_FillValue = 0b ;
sensor_status_qc:valid_range = 1b, 63b ;
sensor_status_qc:flag_masks = 1b, 2b, 4b, 8b, 16b, 32b ;
sensor_status_qc:flag_meanings = "low_battery processor_fault
memory_fault disk_fault
software_fault
maintenance_required" ;
Am I right, that this does not work yet? I am about to work over at https://github.com/jbusecke/cmip6_preprocessing/issues/119 for a bit, and just wanted to check.
No but you can implement it!
How are you planning to use it? in .where?
Yes. This is a prototype of the code I am writing right now:
from cmip6_preprocessing.regionmask import merged_mask, _default_merge_dict
def basin_mask(ds):
basins = regionmask.defined_regions.natural_earth.ocean_basins_50
basins = merged_mask(basins, ds)
# add cf compatible metadata
basins.name = 'basin'
flag_meanings = [m.replace(' ','_').lower() for m in list(_default_merge_dict().keys())]
flag_values = np.arange(len(flag_meanings)).astype(str)
basins.attrs.update({'flag_values':','.join(flag_values), 'flag_meanings':' '.join(flag_meanings)})
return basins
def mask_basin(ds, basin, drop=False):
mask = basin_mask(ds)
# this should be done with cf_xarray once https://github.com/xarray-contrib/cf-xarray/issues/199 is implemented.
# but for now some hardcoded hack...
flag_meanings = mask.attrs['flag_meanings'].split(' ')
flag_values = mask.attrs['flag_values'].split(',')
flag_zipped = zip(flag_meanings, flag_values)
print(flag_zipped)
if basin not in flag_meanings:
raise ValueError(f'Did not find basin [{basin}] in given values [{flag_meanings}]')
else:
flag_matched = [f for f in flag_zipped if f[0] == basin][0]
print(flag_matched)
ds_masked = ds.where(mask==int(flag_matched[1]), drop=drop)
ds_masked.attrs['basin_masked'] = flag_matched[0]
return ds_masked
This works very nicely in my work context (with any CMIP6 model):
ds = data_dict['CMIP.MIROC.MIROC6.historical.Omon.gn']
# basin_test = basin_mask(ds)
ds_masked = mask_basin(ds, 'southern_ocean')
plt.figure()
ds.thetao.isel(time=0, lev=0, member_id=0).plot()
plt.figure()
ds_masked.thetao.isel(time=0, lev=0, member_id=0).plot()


I would like to replace these lines
flag_meanings = mask.attrs['flag_meanings'].split(' ')
flag_values = mask.attrs['flag_values'].split(',')
flag_zipped = zip(flag_meanings, flag_values)
print(flag_zipped)
if basin not in flag_meanings:
raise ValueError(f'Did not find basin [{basin}] in given values [{flag_meanings}]')
else:
flag_matched = [f for f in flag_zipped if f[0] == basin][0]
print(flag_matched)
ds_masked = ds.where(mask==int(flag_matched[1]), drop=drop)
with e.g.
ds_masked = ds.where(mask.cf == basin)
Thought?
Looks nice. I think I like the more explicit version with a CfFlags class:
mask.cf.flag == basinandmask.cf.flag.isin(["atlantic_ocean", "pacific_ocean"])
We should also add a flags section to the repr.
PR?
I dont think I can put one in today (real WTF energy over here haha). But would love to do this at a later point!
Leaving this open for flag masks. Contributions welcome!
Oh nice, didn't know about this - maybe I could use this for 2D masks in regionmask somehow. (It's always a bit a bummer that there is no way to identify regions by name for 2D masks.)
Can you provide an example of what you want?
The documentation is decent: https://cf-xarray.readthedocs.io/en/latest/flags.html
@mathause if we could work functionality like in my example above into regionmask that would be amazing. Together with this I might be able to completely remove the regionmask module from cmip6_preprocessing (I would really like that!).
Let me know if you would like some review, comments or anything else that would help.
Can you provide an example of what you want?
regionmask encodes the region a grid point with a number. So this would allow to also know the name of the region going along with the number. Very similar to the example given in the documentation. So I guess flag_masks is not required for this (but I am not entirely sure I correctly understand this part.)
if we could work functionality like in my example above into regionmask that would be amazing.
@jbusecke: which part do you mean? Should we discuss this in regionmask/regionmask#346?
I think flag masks are more for quality control stuff but I don't really understand them either
@jbusecke: which part do you mean? Should we discuss this in https://github.com/regionmask/regionmask/issues/346?
Sorry this was not clear. What I meant was that if both https://github.com/nvkelso/natural-earth-vector/issues/697 and https://github.com/regionmask/regionmask/issues/346 would work, I would not have to do any of this in cmip6_pp (or maybe just provide a thin wrapper. That would be amazing! Ill move over to your issue and ref this for further discussion.
Hi! I needed to extract flags masks recently and delved into it a bit. I never used cf-xarray but I am up for working on adding support for flag_masks.
I stumbled upon this bit of code that can unpack (binary) flags values into a array of booleans. I guess it can adapted for our uses, and maybe expanded directly for dask arrays.
This method of unpacking all flags at once can inflate data (in my works it goes from a single NC_SHORT variable to 16 boolean arrays). Maybe I can work on extracting only the flags the user is interested in.
So if no one is already on that I can start working on a PR !
A PR is very welcome!
extract flags masks
What does this mean? You need a time series of booleans for each bit in the mask?
@rsignell-usgs, @ocefpaf: do you have suggestions on what might be useful here?
extract flags masks
What does this mean? You need a time series of booleans for each bit in the mask?
The flag variable codes for as many boolean masks as indicated by the flag_masks attribute. So instead of unpacking everything maybe we could only retrieve one boolean mask (with the same dimensions as the flag variable) based on its corresponding flag_meanings.
So for
sensor_status_qc:flag_masks = 1b, 2b, 4b, 8b, 16b, 32b ;
sensor_status_qc:flag_meanings = "low_battery processor_fault
memory_fault disk_fault
software_fault
maintenance_required" ;
So you want sensor_status_qc.cf == "memory_fault" to give a boolean time series?
For example 3.5 it should return a boolean array with dimensions (time, depth, lat, lon), as for byte sensor_status_qc(time, depth, lat, lon).
Ah yes that's right. I would be happy to merge a PR with this functionality =) > The core code for flag variables is https://github.com/xarray-contrib/cf-xarray/blob/1f590210a1e9d1443c6b3f273fede5adc42085b2/cf_xarray/accessor.py#L924-L935
I'll start working on that then. I still have to see how to make it work with example 3.6 which combines flag_masks and flag_values...
@DocOtak Are "flag masks" something you deal with at CCHDO? Do you have opinions on how cf-xarray can provide a useful interface for taking advantage of them?
Pinging @kwilcox here b/c he may be interested in this.
@dcherian We are still using the WOCE QC flags (all three versions of them) so flag masks aren’t something we deal with. Flag masks as I read them, are basically a “compression” technique for encoding multiple states and the allowed combinations. Basically, I think the most useful interface would be one that hides flag masks and just lets me ask if a flag value is set and not care how it is encoded.
Given the following example from the CF document:
byte sensor_status_qc(time, depth, lat, lon) ;
sensor_status_qc:long_name = "Sensor Status" ;
sensor_status_qc:standard_name = "status_flag" ;
sensor_status_qc:_FillValue = 0b ;
sensor_status_qc:valid_range = 1b, 15b ;
sensor_status_qc:flag_masks = 1b, 2b, 12b, 12b, 12b ;
sensor_status_qc:flag_values = 1b, 2b, 4b, 8b, 12b ;
sensor_status_qc:flag_meanings =
"low_battery
hardware_fault
offline_mode calibration_mode maintenance_mode" ;
There are three independent states values and I might expect to be able to do boolean logic on these:
low_battery(true/false so only one bit needed)hardware_fault(true/false so only one bit needed)*_mode(with 3 possible values, two bits needed)
The masks are basically which bits you look at to determine the value for these states (I didn’t look, but assume the bitwise and has precedence over the equality test):
- low_battery -> where
(variable value) & 1b == 1b - hardware_fault -> where
(variable value) & 2b == 2b - offline_mode -> where
(variable value) & 12b == 4b - calibration_mode -> where
(variable value) & 12b == 8b - maintenance_mode -> where
(variable value) & 12b == 12b
I suspect the easiest thing to understand as a user would be just some “virtual” boolean variables with a similar property interface as xr.Dataset, assuming the flag meanings are compatible with being properties, just fallback to the usual “getter” protocol if not. (I’m also guessing these should be somewhat lazy in their evaluation):
- ds.cf.flag.low_battery => (some boolean array)
- ds.cf.flag.offline_mode => (some boolean array)
This would let me do the flag logic myself and not really need to figure out a (new?) query interface…
Things with low battery and a hardware fault:
ds.cf.flag.low_battery & ds.cf.flag.hardware_fault
Things in maintenance_mode:
ds.cf.flag.maintenance_mode
This combination is not allowed by the flag masks/values and would result in an all false, I wouldn’t expect it to throw:
ds.cf.offline_mode & ds.cf.calibration_mode
It can get verbose… things with a low battery an no other issues:
ds.cf.flag.low_battery & ~ds.cf.flag.low_battery & ~(ds.cf.flag.offline_mode | ds.cf.flag.calibration_mode | ds.cf.flag. maintenance_mode)
ds.cf.flag.low_battery => (some boolean array)
I like this idea.
ds.cf.flags could return a dictionary or namedtuple mapping "flag_meaning" to a boolean mask.
Thank you @DocOtak, this is well explained, and I also like the proposed interface. I can look into implementing it.
One thing though, the interface implemented so far had inequalities supported and I am not sure in what cases it could be useful to write
ds.cf > 'some_flag'
I did not port it for flag_masks as it make things a bit more complicated. But maybe someone here has a use case for it ?
For those interested, #354 is almost ready but could use some testing for correctness and feedback on whether the current API is useful.
Thanks to @Descanonge !
The current design is
da.cf.flags -> Dataset
which I quite like.
This is great to hear, excited to see it go live!
Thanks for your work on this!
Can you test out #354 for your use case please?
Sure, I'll give it a shot.