datajoint-python
datajoint-python copied to clipboard
Convenience functions to insert / fetch when an attach field is in table definition
Feature Request
Problem
When inserting into a table that has a field result : attach@minio
, the insert
table method expects a file path. Similarly, fetch
stores a file and returns a file path. This is often times inconvenient, because (i) the data saved in the file is required as an object in the python script one is executing, and (ii) the saved / downloaded files remains on local storage even after the script terminated.
Requirements
Possible solution: Introduce a parameter to insert
that automatically saves the data that should be inserted to a file, inserts it into the table, and then removes that file. Similarly, fetch
could save the file, and return the file / data loaded within the python script.
Justification
See problem section
Alternative Considerations
Currently I am using an AttachMixin
as a workaround, i.e. my table would be defined as class MyTable(AttachMixin, dj.Computed)
. The mixin could be the code basis for the feature I suggested, although it would need a little bit of improvement.
class AttachMixin:
def attach_insert(self, keys: Iterable[Dict[str, Any]], attach_keys: Iterable[str]) -> None:
if not isinstance(attach_keys, list):
raise ValueError("attach_keys must be a list")
with tempfile.TemporaryDirectory(dir=os.environ.get("TMP", ".")) as temp_dir:
for (i, key), ak in product(enumerate(keys), attach_keys):
path = os.path.join(temp_dir, create_random_str() + ".pkl")
with open(path, "wb") as f:
pickle.dump(key[ak], f)
keys[i][ak] = path
self.insert(keys)
def attach_insert1(self, key: Dict[str, Any], attach_keys: Iterable[str]) -> None:
self.attach_insert([key], attach_keys)
def attach_fetch(
self,
*attrs: str,
key: Optional[Dict[str, Any]] = None,
**kwargs,
) -> Union[Dict[str, Any], List]:
key = key or {}
with tempfile.TemporaryDirectory(dir=os.environ.get("TMP", ".")) as temp_dir:
ret = (self & key).fetch(*attrs, download_path=temp_dir, **kwargs) # array, list[dict]
if isinstance(ret, dict):
ret = self._load_from_dict(ret)
elif isinstance(ret, Iterable):
ret = np.array(ret)
for i, value in enumerate(ret):
if isinstance(value, dict):
ret[i] = self._load_from_dict(value)
elif self._is_pkl_path(value):
with open(value, "rb") as f:
ret[i] = pickle.load(f)
else:
raise NotImplementedError(f"Value {value} is not a dict or a pkl path")
elif self._is_pkl_path(ret):
with open(ret, "rb") as f:
ret = pickle.load(f)
else:
raise NotImplementedError(f"Return value {ret} is not a dict, Iterable, or a pkl path")
return ret
def attach_fetch1(
self,
*attrs: str,
key: Optional[Dict[str, Any]] = None,
**kwargs,
) -> Union[Dict[str, Any], List]:
ret = self.attach_fetch(*attrs, key=key, **kwargs)
if len(ret) > 1:
raise dj.DataJointError(f"fetch1 should only return one tuple. {len(ret)} tuples were found")
return ret[0]
def _load_from_dict(self, d: dict[str, str]) -> dict[str, Any]:
for key, value in d.items():
if self._is_pkl_path(value):
with open(value, "rb") as f:
d[key] = pickle.load(f)
return d
def _is_pkl_path(self, value):
return (
isinstance(value, str) and value.endswith(".pkl") and os.path.isfile(value)
)
Related
This issues might be (loosely) related: https://github.com/datajoint/datajoint-python/issues/1109 https://github.com/datajoint/datajoint-python/issues/1099
If you think such a feature could be helpful to be included in datajoint, I would be happy to help implementing it.
I think you're suggesting some sort of a user-provided functions on insert and on fetch for attach
type.
This is very much the idea of DataJoint's AttributeAdapter feature - see here
With that feature, you can define a new DataJoint datatype (e.g. attack_pkl
or something like that).
See some examples here:
- https://github.com/dimitri-yatsenko/db-programming-with-datajoint/blob/master/notebooks/Adapted-Types.ipynb
- https://github.com/datajoint-company/db-programming-with-datajoint/blob/master/notebooks/NWB-Adapter.ipynb