clouddrift
clouddrift copied to clipboard
Range-aware subset
As discussed with @selipot today who proposed this idea.
Current implementation of subset
is cloud-optimized for criteria that have a traj
dimension, for example, subsetting by ID:
subset(ds, {"ID": [2578, 2582, 2583]})
However, subsetting by criteria that have an obs
dimension, for example, subsetting by region or time:
subset(ds, {"lat": (21, 31), "lon": (-98, -78)})
requires downloading the entire variables that appear in the criteria to make the comparison locally.
However, if the range (min and max) of these variables were known, subset
could subset by ID under the hood, thus effectively doing the subset by obs
dimension in a cloud-optimized way.
clouddrift
could propose the following requirement for cloud-optimized ragged arrays: Every numeric variable <var>
with the obs
dimension will be accompanied by the variables <var_min>
and <var_max>
with the traj
dimension.
If the expected range variables are still not found in the dataset, subset
could proceed to carry out the comparison as is in the current implementation.