datajoint-python
datajoint-python copied to clipboard
Cell array of arrays of doubles cannot be fetched in python, only in matlab
Bug Report
Description
Fetching fails in python when each entry for a given attribute (defined in matlab) is a cell array, and each element of the cell array is an array of doubles. Fetching in matlab works as expected.
Reproducibility
Windows, Python 3.9.13, DataJoint 0.13.8
Steps:
- Define and populate table in matlab containing an attribute such as:
epoch_pos_range=null : blob # list of y position ranges corresponding to n epochs in epoch_list, (e.g., {[y_on y_off],[y_on y_off]} for epoch_list {'epoch1','epoch2'})
- Fetch in matlab (works as intended)
- Attempt to fetch in python (throws a reshaping error for the array)
Error stack:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
in
----> 1 VM['opto'].OptoSession.fetch('epoch_pos_range')
[c:\Users\admin\.conda\envs\sandbox\lib\site-packages\datajoint\fetch.py](file:///C:/Users/admin/.conda/envs/sandbox/lib/site-packages/datajoint/fetch.py) in __call__(self, offset, limit, order_by, format, as_dict, squeeze, download_path, *attrs)
227 attributes = [a for a in attrs if not is_key(a)]
228 ret = self._expression.proj(*attributes)
--> 229 ret = ret.fetch(
230 offset=offset,
231 limit=limit,
[c:\Users\admin\.conda\envs\sandbox\lib\site-packages\datajoint\fetch.py](file:///C:/Users/admin/.conda/envs/sandbox/lib/site-packages/datajoint/fetch.py) in __call__(self, offset, limit, order_by, format, as_dict, squeeze, download_path, *attrs)
287 for name in heading:
288 # unpack blobs and externals
--> 289 ret[name] = list(map(partial(get, heading[name]), ret[name]))
290 if format == "frame":
291 ret = pandas.DataFrame(ret).set_index(heading.primary_key)
[c:\Users\admin\.conda\envs\sandbox\lib\site-packages\datajoint\fetch.py](file:///C:/Users/admin/.conda/envs/sandbox/lib/site-packages/datajoint/fetch.py) in _get(connection, attr, data, squeeze, download_path)
108 if attr.uuid
109 else (
--> 110 blob.unpack(
111 extern.get(uuid.UUID(bytes=data)) if attr.is_external else data,
112 squeeze=squeeze,
[c:\Users\admin\.conda\envs\sandbox\lib\site-packages\datajoint\blob.py](file:///C:/Users/admin/.conda/envs/sandbox/lib/site-packages/datajoint/blob.py) in unpack(blob, squeeze)
603 return blob
604 if blob is not None:
--> 605 return Blob(squeeze=squeeze).unpack(blob)
[c:\Users\admin\.conda\envs\sandbox\lib\site-packages\datajoint\blob.py](file:///C:/Users/admin/.conda/envs/sandbox/lib/site-packages/datajoint/blob.py) in unpack(self, blob)
127 blob_format = self.read_zero_terminated_string()
128 if blob_format in ("mYm", "dj0"):
--> 129 return self.read_blob(n_bytes=len(self._blob) - self._pos)
130
131 def read_blob(self, n_bytes=None):
[c:\Users\admin\.conda\envs\sandbox\lib\site-packages\datajoint\blob.py](file:///C:/Users/admin/.conda/envs/sandbox/lib/site-packages/datajoint/blob.py) in read_blob(self, n_bytes)
161 % data_structure_code
162 )
--> 163 v = call()
164 if n_bytes is not None and self._pos - start != n_bytes:
165 raise DataJointError("Blob length check failed! Invalid blob")
[c:\Users\admin\.conda\envs\sandbox\lib\site-packages\datajoint\blob.py](file:///C:/Users/admin/.conda/envs/sandbox/lib/site-packages/datajoint/blob.py) in read_cell_array(self)
493 return (
494 self.squeeze(
--> 495 np.array(result).reshape(shape, order="F"), convert_to_scalar=False
496 )
497 ).view(MatCell)
ValueError: cannot reshape array of size 4 into shape (1,2)
Thanks for the report, @renanmcosta. Typically the MATLAB cell array gets properly packed and unpacked. We have not encountered the error that you reported. We will investigate further and get back to you.
For now I've managed to fetch with the temporary fix below. I don't think it's very robust, but I'm copying it here in case it's informative.
def read_cell_array(self):
"""deserialize MATLAB cell array"""
n_dims = self.read_value()
shape = self.read_value(count=n_dims)
n_elem = int(np.prod(shape))
result = [self.read_blob(n_bytes=self.read_value()) for _ in range(n_elem)]
if n_elem != len(np.ravel(result, order="F")): # if not all elements are scalars. shouldn't work for ragged arrays
shape = (-1,) + tuple(shape[1:n_dims])
return (
self.squeeze(
np.array(result).reshape(shape, order="F"), convert_to_scalar=False
)
).view(MatCell)
Greetings,
I have just encountered the same problem, and temp fix seems to work (Thanks a lot @renanmcosta)
Temporary fix returns an array but with shape = (537000, 2).
In matlab its an 1×2 cell array {10×5370×10 single} {10×5370×10 single}.
type(temp_fixed) --> datajoint.blob.MatCell
Am I able to retrieve the original dimensions or this is a robustness problem of the temporary fix?
Thanks in advance