datajoint-python
datajoint-python copied to clipboard
Support meta information for `filepath` attributes.
An example use-case is to work with NWB files more elegantly. For a particular NWB object, we need to store 2 things: object_id
- varchar(36) and nwb_file
- filepath@store
Currently dj.AttributeAdapter
does not support this, so a workaround is to use longblob
and store a tuple:
class NWBObjectAdapter(dj.AttributeAdapter):
attribute_type = 'longblob'
# attribute_type = "('varchar(36)', 'filepath@store')"
def put(self, nwbobj):
# take any arbitrary NWB object and extract a tuple of: (object_id, nwb_filepath)
nwb_fp = nwbobj.container_source
obj_id = nwbobj.obj_id_to_store # new addon field to the nwbobj to indicate which object to store
return obj_id, nwb_fp # this is the tuple that is stored in DB
def get(self, stored_tuple):
obj_id, nwb_fp = stored_tuple
io = pynwb.NWBHDF5IO(nwb_fp, mode='r')
nwbf = io.read()
return nwbf.objects[obj_id]
but this workaround implementation would not support filepath@store
type, which is crucial for working with NWB objects
Proposed solution number 1:
Special feature for filepath@store
(and potentially attach@store
to have meta_information attached to it.
Example of how that may look like:
@schema
class NWBRaw(dj.Manual):
definition = """
-> Session
---
nwbfile: filepath@store
"""
NWBRaw.insert1({**session_key, 'nwbfile': (nwb_filepath, {'object_id': obj_uuid})})
fp, meta = (NWBRaw & session_key).fetch1('nwbfile', fetch_meta=True)
Example dj.AttributeAdapter
for NWB object with this feature:
class NWBObjectAdapter(dj.AttributeAdapter):
attribute_type = 'filepath@store'
def put(self, nwbobj):
nwb_fp = pathlib.Path(nwbobj.container_source)
obj_id = nwbobj.obj_id_to_store
return nwb_fp, dict(object_id=obj_id)
def get(self, filepath): # returned as a tuple: (filepath, meta)
nwb_fp, meta_dict = filepath
io = pynwb.NWBHDF5IO(filepath.as_poxis(), mode='r')
nwbf = io.read()
return nwbf.objects[meta_dict['object_id']]
The fetch_meta
argument in fetch
may be unnecessary. If the user inserts the filepath with metadata, it will come back with metadata as a tuple. That would be cosistent and intuitive: you always fetch what you insert.
Nah, that just won't be a reasonable interface as just having one entry with meta can disrupt it. We do need to have a clean separation for when meta is returned vs not.
The separation is clean. If you insert a tuple, you fetch it back. It's simple, does not need to be explained. Users get back what they insert. If they choose to insert some records with metadata and some without, that's what they will get back too — straightforward and transparent.
It will just be much nicer to be able to fully expect if you are going to get a tuple vs list of strings precisely corresponding to the filepath. Meta provision should really be optional with no chance of disrupting the main usage of obtaining back the filepath.
there is a clear separation between actual data and metadata, and I find it completely consistent we treat them separately. Let's proceed with fetch_meta
based behavior and discuss further as we see the examples.
Agreed. Yes, the option of skipping the metadata will be helpful.
Perhaps by default, fetch_meta=None
, which means fetch whatever you inserted. fetch_meta=True
returns tuples always. fetch_meta=False
returns the paths only.
Hm, potentially. Although I'd really think it's enough to offer True/False
behavior defaulting to False
.
Then this would introduce the inconsistency that you insert one thing and fetch another. The default behavior needs to be most consistent.
You are inserting metadata along with the data, and for that to be treated differently sounds just fine to me. It's not quite the same situation as inserting a tuple and expecting tuple back for a blob.
Is there a good reason to treat metadata differently? It's all just data. Special behaviors require extra documentation and explanations. Fetching what is inserted is consistent behavior through all other cases. If the user does not like it, they will look for the feature to skip the metadata.
Here is a more complete example using the custom data type for NWB objects.
class NWBTrace(dj.AttributeAdapter):
"""
custom datajoint attribute type for NWB objects in NWB files
"""
attribute_type = 'filepath@store'
def put(self, nwbobj):
nwb_path = nwbobj.container_source
return nwb_path, nwbobj.trace_id_to_store
def get(self, filepath): # returned as a tuple: (filepath, meta)
nwb_path, object_id = filepath
return pynwb.NWBHDF5IO(nwb_path, mode='r').read()[object_id]
nwb_trace = NWBTrace()
@schema
class Ephys(dj.Manual):
definition = """
-> Session
---
trace: <nwb_trace>
"""
...
Ephys.insert1({**session_key, 'trace': (nwb_filepath, obj_uuid))
trace = (Ephys & session_key).fetch1('trace')