intake icon indicating copy to clipboard operation
intake copied to clipboard

Arraytransform

Open raybellwaves opened this issue 2 years ago • 2 comments

Completely untested. Looking for feedback how to test. Hopefully i'll also come back to this and work out how to test it.

Idea is to create an xarray entry which is a subsample of another xarray entry

e.g. global_dataset has lat: [-90: 90] and lon: [-180: 180] northern_hemisphere_dataset is created as ds.sel({"lat": slice(0, 90)}).

Apologies for the black formatting here. I believe my PR is legible at the bottom but I can remove if desired.

Edit:

I kind of got it to work but very hacky. I can't work how to pass a dict(...) to yaml and have python evaluate it without converting it to a string. In the hacky example below I put it on two line and use eval...

Ipython:

import xarray as xr
xr.tutorial.open_dataset("air_temperature").to_zarr("air_temperature.zarr")

create test.yaml as:

metadata:
  version: 1
sources:
  air_temperature:
    description: description
    driver: zarr
    args:
      urlpath: air_temperature.zarr

back to Ipython:

from intake import open_catalog
cat = open_catalog("test.yaml")
ds = cat.air_temperature.read()
# test selecting
ds.sel({"lon": slice(200, 210)})
# test as a function
def f_select_subsample(ds, _subsample_dict):
    return ds.sel(_subsample_dict)
_subsample_dict = dict([("lon", slice(200, 210))])
f_select_subsample(ds, _subsample_dict)

create my_derived.py:

from intake import Schema
from intake.source.derived import GenericTransform


class ArrayTransform(GenericTransform):
    """Transform where the input and output are both Dask-compatible arrays

    This derives from GenericTransform, and you must supply ``transform`` and
    any ``transform_kwargs``.
    """

    input_container = "xarray"
    container = "xarray"
    optional_params = {}
    _ds = None

    def to_dask(self):
        if self._ds is None:
            self._pick()
            print(self._params)
            self._ds = self._transform(
                self._source.to_dask(), **self._params["transform_kwargs"]
            )
        return self._ds

    def _get_schema(self):
        """load metadata only if needed"""
        self.to_dask()
        return Schema(
            datashape=None,
            dtype=None,
            shape=None,
            npartitions=None,
            extra_metadata=self._ds.extra_metadata,
        )

    def read(self):
        return self.to_dask().compute()
    
    
class Subsample(ArrayTransform):
    """Simple array transform to subsample an array

    Given as an example of how to make a specific array transform.
    Note that you could use ArrayTransform directly, by writing a
    function to choose the subsample instead of a method as here.
    """

    input_container = "xarray"
    container = "xarray"
    required_params = ["subsample_dict"]

    def __init__(self, subsample_dict, **kwargs):
        """
        subsample_dict: dict of with keys as dimensions and values as slice
            Subsample to choose from the target array
        """
        # this class wants required "subsample_dict", but ArrayTransform
        # uses "transform_kwargs", which we don't need since we use a method for the
        # transform
        kwargs.update(
            transform=self.select_subsample,
            subsample_dict=subsample_dict,
            transform_kwargs={},
        )
        super().__init__(**kwargs)

    def select_subsample(self, ds):
        return ds.sel(self._params["subsample_dict"])
    
    
def f_select_subsample(ds, subsample_dict):
    print(subsample_dict)
    print(type(subsample_dict))
    # hack to pass eval slice which came in as a string
    subsample_dict2 = {k: eval(v) for k, v in subsample_dict.items()}
    print(type(subsample_dict2))    
    return ds.sel(subsample_dict2)

back to test.yaml and add the following at the bottom:

  air_temperature_subsample:
    description: description
    driver: my_derived.ArrayTransform
    args:
      targets:
        - air_temperature
      transform: "my_derived.f_select_subsample"
      transform_kwargs:
        subsample_dict:
          lon: slice(200, 210)

then in Ipython try:

cat = open_catalog("test.yaml")
cat.air_temperature_subsample.read()

raybellwaves avatar May 09 '22 16:05 raybellwaves

In the code submitted, there's no string hackery any more right?

I like the idea here, and I don't see any reason not to include these. Best would be also to add a docs section to the transforms page, showing how (and why) you might use these.

martindurant avatar May 18 '22 20:05 martindurant

Thinking about this again I may add this to https://github.com/intake/intake-xarray which may be easier to test.

Once added i'll make a note in the transform docs here.

raybellwaves avatar May 23 '22 02:05 raybellwaves

closing in favor of https://github.com/intake/intake/pull/682

raybellwaves avatar Aug 20 '22 01:08 raybellwaves