intake
intake copied to clipboard
Arraytransform
Completely untested. Looking for feedback how to test. Hopefully i'll also come back to this and work out how to test it.
Idea is to create an xarray entry which is a subsample of another xarray entry
e.g. global_dataset has lat: [-90: 90] and lon: [-180: 180] northern_hemisphere_dataset is created as ds.sel({"lat": slice(0, 90)}).
Apologies for the black formatting here. I believe my PR is legible at the bottom but I can remove if desired.
Edit:
I kind of got it to work but very hacky. I can't work how to pass a dict(...) to yaml and have python evaluate it without converting it to a string. In the hacky example below I put it on two line and use eval...
Ipython:
import xarray as xr
xr.tutorial.open_dataset("air_temperature").to_zarr("air_temperature.zarr")
create test.yaml as:
metadata:
version: 1
sources:
air_temperature:
description: description
driver: zarr
args:
urlpath: air_temperature.zarr
back to Ipython:
from intake import open_catalog
cat = open_catalog("test.yaml")
ds = cat.air_temperature.read()
# test selecting
ds.sel({"lon": slice(200, 210)})
# test as a function
def f_select_subsample(ds, _subsample_dict):
return ds.sel(_subsample_dict)
_subsample_dict = dict([("lon", slice(200, 210))])
f_select_subsample(ds, _subsample_dict)
create my_derived.py:
from intake import Schema
from intake.source.derived import GenericTransform
class ArrayTransform(GenericTransform):
"""Transform where the input and output are both Dask-compatible arrays
This derives from GenericTransform, and you must supply ``transform`` and
any ``transform_kwargs``.
"""
input_container = "xarray"
container = "xarray"
optional_params = {}
_ds = None
def to_dask(self):
if self._ds is None:
self._pick()
print(self._params)
self._ds = self._transform(
self._source.to_dask(), **self._params["transform_kwargs"]
)
return self._ds
def _get_schema(self):
"""load metadata only if needed"""
self.to_dask()
return Schema(
datashape=None,
dtype=None,
shape=None,
npartitions=None,
extra_metadata=self._ds.extra_metadata,
)
def read(self):
return self.to_dask().compute()
class Subsample(ArrayTransform):
"""Simple array transform to subsample an array
Given as an example of how to make a specific array transform.
Note that you could use ArrayTransform directly, by writing a
function to choose the subsample instead of a method as here.
"""
input_container = "xarray"
container = "xarray"
required_params = ["subsample_dict"]
def __init__(self, subsample_dict, **kwargs):
"""
subsample_dict: dict of with keys as dimensions and values as slice
Subsample to choose from the target array
"""
# this class wants required "subsample_dict", but ArrayTransform
# uses "transform_kwargs", which we don't need since we use a method for the
# transform
kwargs.update(
transform=self.select_subsample,
subsample_dict=subsample_dict,
transform_kwargs={},
)
super().__init__(**kwargs)
def select_subsample(self, ds):
return ds.sel(self._params["subsample_dict"])
def f_select_subsample(ds, subsample_dict):
print(subsample_dict)
print(type(subsample_dict))
# hack to pass eval slice which came in as a string
subsample_dict2 = {k: eval(v) for k, v in subsample_dict.items()}
print(type(subsample_dict2))
return ds.sel(subsample_dict2)
back to test.yaml and add the following at the bottom:
air_temperature_subsample:
description: description
driver: my_derived.ArrayTransform
args:
targets:
- air_temperature
transform: "my_derived.f_select_subsample"
transform_kwargs:
subsample_dict:
lon: slice(200, 210)
then in Ipython try:
cat = open_catalog("test.yaml")
cat.air_temperature_subsample.read()
In the code submitted, there's no string hackery any more right?
I like the idea here, and I don't see any reason not to include these. Best would be also to add a docs section to the transforms page, showing how (and why) you might use these.
Thinking about this again I may add this to https://github.com/intake/intake-xarray which may be easier to test.
Once added i'll make a note in the transform docs here.
closing in favor of https://github.com/intake/intake/pull/682