Send correct blocks.
This fix affects access via the server.
The client side constructs an xarray.Dataset backed by dask arrays with some
chunking. When it loads data, it requests partitions specified by a variable
name and a block "part", as in ('x', 0, 0, 1).
If, on the server side, the DataSourceMixin subclass is holding a plain
numpy array, not a dask array, then it ignores the "part" and always sends
the whole array for the requested variable.
On the client side, this manifests as a mismatch between the dask array's shape (the shape of the data it is expected) and the shape of the numpy array that it receives, leading to errors like
ValueError: replacement data must match the Variable's shape
> /sdcc/u/dallan/venv/test-databroker/lib64/python3.6/site-packages/xarray/core/variable.py(301)data()
299 if data.shape != self.shape:
300 raise ValueError(
--> 301 "replacement data must match the Variable's shape")
302 self._data = data
303
ipdb> data.shape
(164, 1, 4000, 3840)
ipdb> self.shape
(41, 1, 1000, 960)
where data that arrives is larger than the data expected.
I expect it's worth refining this to make it more efficient before merging, and it needs a test. This is just a request for comments and suggestions.
I haven't had a chance to investigate the failure
The subclasses that override _get_schema override _get_schema in the base class DataSourceMixin without calling super(), so self._chunks is never defined. It looks like there is a fair amount of copy paste between the base class and its subclasses, so the easiest fix might be to remove that and use super(). Can't get to this today, but can revisit later this week.