datajoint-python icon indicating copy to clipboard operation
datajoint-python copied to clipboard

Cell array of arrays of doubles cannot be fetched in python, only in matlab

Open renanmcosta opened this issue 1 year ago • 7 comments

Bug Report

Description

Fetching fails in python when each entry for a given attribute (defined in matlab) is a cell array, and each element of the cell array is an array of doubles. Fetching in matlab works as expected.

Reproducibility

Windows, Python 3.9.13, DataJoint 0.13.8

Steps:

  1. Define and populate table in matlab containing an attribute such as: epoch_pos_range=null : blob # list of y position ranges corresponding to n epochs in epoch_list, (e.g., {[y_on y_off],[y_on y_off]} for epoch_list {'epoch1','epoch2'})
  2. Fetch in matlab (works as intended)
  3. Attempt to fetch in python (throws a reshaping error for the array)

Error stack:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
 in 
----> 1 VM['opto'].OptoSession.fetch('epoch_pos_range')

[c:\Users\admin\.conda\envs\sandbox\lib\site-packages\datajoint\fetch.py](file:///C:/Users/admin/.conda/envs/sandbox/lib/site-packages/datajoint/fetch.py) in __call__(self, offset, limit, order_by, format, as_dict, squeeze, download_path, *attrs)
    227             attributes = [a for a in attrs if not is_key(a)]
    228             ret = self._expression.proj(*attributes)
--> 229             ret = ret.fetch(
    230                 offset=offset,
    231                 limit=limit,

[c:\Users\admin\.conda\envs\sandbox\lib\site-packages\datajoint\fetch.py](file:///C:/Users/admin/.conda/envs/sandbox/lib/site-packages/datajoint/fetch.py) in __call__(self, offset, limit, order_by, format, as_dict, squeeze, download_path, *attrs)
    287                 for name in heading:
    288                     # unpack blobs and externals
--> 289                     ret[name] = list(map(partial(get, heading[name]), ret[name]))
    290                 if format == "frame":
    291                     ret = pandas.DataFrame(ret).set_index(heading.primary_key)

[c:\Users\admin\.conda\envs\sandbox\lib\site-packages\datajoint\fetch.py](file:///C:/Users/admin/.conda/envs/sandbox/lib/site-packages/datajoint/fetch.py) in _get(connection, attr, data, squeeze, download_path)
    108         if attr.uuid
    109         else (
--> 110             blob.unpack(
    111                 extern.get(uuid.UUID(bytes=data)) if attr.is_external else data,
    112                 squeeze=squeeze,

[c:\Users\admin\.conda\envs\sandbox\lib\site-packages\datajoint\blob.py](file:///C:/Users/admin/.conda/envs/sandbox/lib/site-packages/datajoint/blob.py) in unpack(blob, squeeze)
    603         return blob
    604     if blob is not None:
--> 605         return Blob(squeeze=squeeze).unpack(blob)

[c:\Users\admin\.conda\envs\sandbox\lib\site-packages\datajoint\blob.py](file:///C:/Users/admin/.conda/envs/sandbox/lib/site-packages/datajoint/blob.py) in unpack(self, blob)
    127         blob_format = self.read_zero_terminated_string()
    128         if blob_format in ("mYm", "dj0"):
--> 129             return self.read_blob(n_bytes=len(self._blob) - self._pos)
    130 
    131     def read_blob(self, n_bytes=None):

[c:\Users\admin\.conda\envs\sandbox\lib\site-packages\datajoint\blob.py](file:///C:/Users/admin/.conda/envs/sandbox/lib/site-packages/datajoint/blob.py) in read_blob(self, n_bytes)
    161                 % data_structure_code
    162             )
--> 163         v = call()
    164         if n_bytes is not None and self._pos - start != n_bytes:
    165             raise DataJointError("Blob length check failed! Invalid blob")

[c:\Users\admin\.conda\envs\sandbox\lib\site-packages\datajoint\blob.py](file:///C:/Users/admin/.conda/envs/sandbox/lib/site-packages/datajoint/blob.py) in read_cell_array(self)
    493         return (
    494             self.squeeze(
--> 495                 np.array(result).reshape(shape, order="F"), convert_to_scalar=False
    496             )
    497         ).view(MatCell)

ValueError: cannot reshape array of size 4 into shape (1,2)

renanmcosta avatar Jul 06 '23 22:07 renanmcosta

Thanks for the report, @renanmcosta. Typically the MATLAB cell array gets properly packed and unpacked. We have not encountered the error that you reported. We will investigate further and get back to you.

kabilar avatar Jul 08 '23 17:07 kabilar

For now I've managed to fetch with the temporary fix below. I don't think it's very robust, but I'm copying it here in case it's informative.

def read_cell_array(self):
        """deserialize MATLAB cell array"""
        n_dims = self.read_value()
        shape = self.read_value(count=n_dims)
        n_elem = int(np.prod(shape))
        result = [self.read_blob(n_bytes=self.read_value()) for _ in range(n_elem)]
        if n_elem != len(np.ravel(result, order="F")): # if not all elements are scalars. shouldn't work for ragged arrays
            shape = (-1,) + tuple(shape[1:n_dims])
        return (
            self.squeeze(
                np.array(result).reshape(shape, order="F"), convert_to_scalar=False
            )
        ).view(MatCell)

renanmcosta avatar Jul 25 '23 22:07 renanmcosta

Greetings,

I have just encountered the same problem, and temp fix seems to work (Thanks a lot @renanmcosta)


Temporary fix returns an array but with shape = (537000, 2).
In matlab its an 1×2 cell array {10×5370×10 single} {10×5370×10 single}.

type(temp_fixed) --> datajoint.blob.MatCell


Am I able to retrieve the original dimensions or this is a robustness problem of the temporary fix?

Thanks in advance

Paschas avatar Mar 08 '24 10:03 Paschas