datajoint-python
datajoint-python copied to clipboard
Migrated external files (from dj 0.11.x) are not accessible with newer dj
I migrated the external of an older schema to dj 0.12.0, but was not able to fetch any of the data. By default, it tries to search the blob in the new folder structure (/xx/xx/uuid and so on) but files in the older style are all listed in a single folder. I believe this is supposed to be taken care by the try-catch here: https://github.com/datajoint/datajoint-python/blob/07c5553e403a3fdf13a0675a81b27ab128068aa4/datajoint/external.py#L203 but actually when _download_buffer fails to find a file it does not raise a MissingExternalFile exception but a FileNotFound exception so it is not catched by the except block and the code to read the blob from the filepath as in 0.11.x (l 205 - l 218) is never executed.
Here's what the error trace looks like:
---------------------------------------------------------------------------
FileNotFoundError Traceback (most recent call last)
Input In [4], in <module>
----> 1 (data.Responses.PerImage() & k).fetch('response')
File /usr/local/lib/python3.8/dist-packages/datajoint/fetch.py:229, in Fetch.__call__(self, offset, limit, order_by, format, as_dict, squeeze, download_path, *attrs)
227 attributes = [a for a in attrs if not is_key(a)]
228 ret = self._expression.proj(*attributes)
--> 229 ret = ret.fetch(
230 offset=offset,
231 limit=limit,
232 order_by=order_by,
233 as_dict=False,
234 squeeze=squeeze,
235 download_path=download_path,
236 format="array",
237 )
238 if attrs_as_dict:
239 ret = [
240 {k: v for k, v in zip(ret.dtype.names, x) if k in attrs}
241 for x in ret
242 ]
File /usr/local/lib/python3.8/dist-packages/datajoint/fetch.py:289, in Fetch.__call__(self, offset, limit, order_by, format, as_dict, squeeze, download_path, *attrs)
286 raise e
287 for name in heading:
288 # unpack blobs and externals
--> 289 ret[name] = list(map(partial(get, heading[name]), ret[name]))
290 if format == "frame":
291 ret = pandas.DataFrame(ret).set_index(heading.primary_key)
File /usr/local/lib/python3.8/dist-packages/datajoint/fetch.py:111, in _get(connection, attr, data, squeeze, download_path)
103 safe_write(local_filepath, data.split(b"\0", 1)[1])
104 return adapt(str(local_filepath)) # download file from remote store
106 return adapt(
107 uuid.UUID(bytes=data)
108 if attr.uuid
109 else (
110 blob.unpack(
--> 111 extern.get(uuid.UUID(bytes=data)) if attr.is_external else data,
112 squeeze=squeeze,
113 )
114 if attr.is_blob
115 else data
116 )
117 )
File /usr/local/lib/python3.8/dist-packages/datajoint/external.py:203, in ExternalTable.get(self, uuid)
201 if blob is None:
202 try:
--> 203 blob = self._download_buffer(self._make_uuid_path(uuid))
204 except MissingExternalFile:
205 if not SUPPORT_MIGRATED_BLOBS:
File /usr/local/lib/python3.8/dist-packages/datajoint/external.py:144, in ExternalTable._download_buffer(self, external_path)
142 return self.s3.get(external_path)
143 if self.spec["protocol"] == "file":
--> 144 return Path(external_path).read_bytes()
145 assert False
File /usr/lib/python3.8/pathlib.py:1207, in Path.read_bytes(self)
1203 def read_bytes(self):
1204 """
1205 Open the file in bytes mode, read it, and close the file.
1206 """
-> 1207 with self.open(mode='rb') as f:
1208 return f.read()
File /usr/lib/python3.8/pathlib.py:1200, in Path.open(self, mode, buffering, encoding, errors, newline)
1198 if self._closed:
1199 self._raise_closed()
-> 1200 return io.open(self, mode, buffering, encoding, errors, newline,
1201 opener=self._opener)
File /usr/lib/python3.8/pathlib.py:1054, in Path._opener(self, name, flags, mode)
1052 def _opener(self, name, flags, mode=0o666):
1053 # A stub for the opener argument to built-in open()
-> 1054 return self._accessor.open(self, flags, mode)
FileNotFoundError: [Errno 2] No such file or directory: '/external/neuro-static/neurostatic_dec_data/1b/70/1b70bd9bdadc2cee3a0e1532e7ea5b69'
here's a straightforward way to solve it https://github.com/atlab/datajoint-python/commit/6a0f2c10186c825dacc48268bbb2b36038c7aa8b they catch the FileNotFoundError and raise the MissingExternalFile one. Works well.
@ecobost Thanks for finding this solution. Would you like to submit a PR?
We do not know many groups needing to migrate from 0.11 and may not be able to add functionality to handle all corner cases and tests.