datajoint-python icon indicating copy to clipboard operation
datajoint-python copied to clipboard

Migrated external files (from dj 0.11.x) are not accessible with newer dj

Open ecobost opened this issue 2 years ago • 2 comments

I migrated the external of an older schema to dj 0.12.0, but was not able to fetch any of the data. By default, it tries to search the blob in the new folder structure (/xx/xx/uuid and so on) but files in the older style are all listed in a single folder. I believe this is supposed to be taken care by the try-catch here: https://github.com/datajoint/datajoint-python/blob/07c5553e403a3fdf13a0675a81b27ab128068aa4/datajoint/external.py#L203 but actually when _download_buffer fails to find a file it does not raise a MissingExternalFile exception but a FileNotFound exception so it is not catched by the except block and the code to read the blob from the filepath as in 0.11.x (l 205 - l 218) is never executed.

Here's what the error trace looks like:

---------------------------------------------------------------------------
FileNotFoundError                         Traceback (most recent call last)
Input In [4], in <module>
----> 1 (data.Responses.PerImage() & k).fetch('response')

File /usr/local/lib/python3.8/dist-packages/datajoint/fetch.py:229, in Fetch.__call__(self, offset, limit, order_by, format, as_dict, squeeze, download_path, *attrs)
    227 attributes = [a for a in attrs if not is_key(a)]
    228 ret = self._expression.proj(*attributes)
--> 229 ret = ret.fetch(
    230     offset=offset,
    231     limit=limit,
    232     order_by=order_by,
    233     as_dict=False,
    234     squeeze=squeeze,
    235     download_path=download_path,
    236     format="array",
    237 )
    238 if attrs_as_dict:
    239     ret = [
    240         {k: v for k, v in zip(ret.dtype.names, x) if k in attrs}
    241         for x in ret
    242     ]

File /usr/local/lib/python3.8/dist-packages/datajoint/fetch.py:289, in Fetch.__call__(self, offset, limit, order_by, format, as_dict, squeeze, download_path, *attrs)
    286     raise e
    287 for name in heading:
    288     # unpack blobs and externals
--> 289     ret[name] = list(map(partial(get, heading[name]), ret[name]))
    290 if format == "frame":
    291     ret = pandas.DataFrame(ret).set_index(heading.primary_key)

File /usr/local/lib/python3.8/dist-packages/datajoint/fetch.py:111, in _get(connection, attr, data, squeeze, download_path)
    103         safe_write(local_filepath, data.split(b"\0", 1)[1])
    104     return adapt(str(local_filepath))  # download file from remote store
    106 return adapt(
    107     uuid.UUID(bytes=data)
    108     if attr.uuid
    109     else (
    110         blob.unpack(
--> 111             extern.get(uuid.UUID(bytes=data)) if attr.is_external else data,
    112             squeeze=squeeze,
    113         )
    114         if attr.is_blob
    115         else data
    116     )
    117 )

File /usr/local/lib/python3.8/dist-packages/datajoint/external.py:203, in ExternalTable.get(self, uuid)
    201 if blob is None:
    202     try:
--> 203         blob = self._download_buffer(self._make_uuid_path(uuid))
    204     except MissingExternalFile:
    205         if not SUPPORT_MIGRATED_BLOBS:

File /usr/local/lib/python3.8/dist-packages/datajoint/external.py:144, in ExternalTable._download_buffer(self, external_path)
    142     return self.s3.get(external_path)
    143 if self.spec["protocol"] == "file":
--> 144     return Path(external_path).read_bytes()
    145 assert False

File /usr/lib/python3.8/pathlib.py:1207, in Path.read_bytes(self)
   1203 def read_bytes(self):
   1204     """
   1205     Open the file in bytes mode, read it, and close the file.
   1206     """
-> 1207     with self.open(mode='rb') as f:
   1208         return f.read()

File /usr/lib/python3.8/pathlib.py:1200, in Path.open(self, mode, buffering, encoding, errors, newline)
   1198 if self._closed:
   1199     self._raise_closed()
-> 1200 return io.open(self, mode, buffering, encoding, errors, newline,
   1201                opener=self._opener)

File /usr/lib/python3.8/pathlib.py:1054, in Path._opener(self, name, flags, mode)
   1052 def _opener(self, name, flags, mode=0o666):
   1053     # A stub for the opener argument to built-in open()
-> 1054     return self._accessor.open(self, flags, mode)

FileNotFoundError: [Errno 2] No such file or directory: '/external/neuro-static/neurostatic_dec_data/1b/70/1b70bd9bdadc2cee3a0e1532e7ea5b69'

ecobost avatar Jun 19 '22 22:06 ecobost

here's a straightforward way to solve it https://github.com/atlab/datajoint-python/commit/6a0f2c10186c825dacc48268bbb2b36038c7aa8b they catch the FileNotFoundError and raise the MissingExternalFile one. Works well.

ecobost avatar Jun 20 '22 00:06 ecobost

@ecobost Thanks for finding this solution. Would you like to submit a PR?

We do not know many groups needing to migrate from 0.11 and may not be able to add functionality to handle all corner cases and tests.

dimitri-yatsenko avatar Jun 20 '22 12:06 dimitri-yatsenko