intake-xarray icon indicating copy to clipboard operation
intake-xarray copied to clipboard

Send correct blocks.

Open danielballan opened this issue 6 years ago • 2 comments

This fix affects access via the server.

The client side constructs an xarray.Dataset backed by dask arrays with some chunking. When it loads data, it requests partitions specified by a variable name and a block "part", as in ('x', 0, 0, 1).

If, on the server side, the DataSourceMixin subclass is holding a plain numpy array, not a dask array, then it ignores the "part" and always sends the whole array for the requested variable.

On the client side, this manifests as a mismatch between the dask array's shape (the shape of the data it is expected) and the shape of the numpy array that it receives, leading to errors like

ValueError: replacement data must match the Variable's shape


> /sdcc/u/dallan/venv/test-databroker/lib64/python3.6/site-packages/xarray/core/variable.py(301)data()
    299         if data.shape != self.shape:
    300             raise ValueError(
--> 301                 "replacement data must match the Variable's shape")
    302         self._data = data
    303 

ipdb>  data.shape
(164, 1, 4000, 3840)
ipdb>  self.shape
(41, 1, 1000, 960)

where data that arrives is larger than the data expected.

I expect it's worth refining this to make it more efficient before merging, and it needs a test. This is just a request for comments and suggestions.

danielballan avatar Sep 17 '19 20:09 danielballan

I haven't had a chance to investigate the failure

martindurant avatar Sep 18 '19 12:09 martindurant

The subclasses that override _get_schema override _get_schema in the base class DataSourceMixin without calling super(), so self._chunks is never defined. It looks like there is a fair amount of copy paste between the base class and its subclasses, so the easiest fix might be to remove that and use super(). Can't get to this today, but can revisit later this week.

danielballan avatar Sep 18 '19 19:09 danielballan