datajoint-python
datajoint-python copied to clipboard

Published 20 hours ago •

Reame
Issues

Support meta information for `filepath` attributes.

Open ttngu207 opened this issue 4 years ago • 13 comments

An example use-case is to work with NWB files more elegantly. For a particular NWB object, we need to store 2 things: object_id - varchar(36) and nwb_file - filepath@store Currently dj.AttributeAdapter does not support this, so a workaround is to use longblob and store a tuple:

class NWBObjectAdapter(dj.AttributeAdapter):
    attribute_type = 'longblob'
    # attribute_type = "('varchar(36)', 'filepath@store')"

    def put(self, nwbobj):
        # take any arbitrary NWB object and extract a tuple of: (object_id, nwb_filepath)
        nwb_fp = nwbobj.container_source
        obj_id = nwbobj.obj_id_to_store  # new addon field to the nwbobj to indicate which object to store
        return obj_id, nwb_fp  # this is the tuple that is stored in DB

    def get(self, stored_tuple):
        obj_id, nwb_fp = stored_tuple
        io = pynwb.NWBHDF5IO(nwb_fp, mode='r')
        nwbf = io.read()
        return nwbf.objects[obj_id]

but this workaround implementation would not support filepath@store type, which is crucial for working with NWB objects

May 26 '20 14:05 ttngu207

Proposed solution number 1: Special feature for filepath@store (and potentially attach@store to have meta_information attached to it.

Example of how that may look like:

@schema
class NWBRaw(dj.Manual):
    definition = """
    -> Session
    ---
    nwbfile: filepath@store
    """
NWBRaw.insert1({**session_key, 'nwbfile': (nwb_filepath, {'object_id': obj_uuid})})
fp, meta = (NWBRaw & session_key).fetch1('nwbfile', fetch_meta=True)

Example dj.AttributeAdapter for NWB object with this feature:

class NWBObjectAdapter(dj.AttributeAdapter):
    attribute_type = 'filepath@store'
    def put(self, nwbobj):
        nwb_fp = pathlib.Path(nwbobj.container_source)
        obj_id = nwbobj.obj_id_to_store  
        return nwb_fp, dict(object_id=obj_id)
    def get(self, filepath):  # returned as a tuple: (filepath, meta)
        nwb_fp, meta_dict = filepath
        io = pynwb.NWBHDF5IO(filepath.as_poxis(), mode='r')
        nwbf = io.read()
        return nwbf.objects[meta_dict['object_id']]

May 27 '20 23:05 ttngu207

The fetch_meta argument in fetch may be unnecessary. If the user inserts the filepath with metadata, it will come back with metadata as a tuple. That would be cosistent and intuitive: you always fetch what you insert.

May 28 '20 00:05 dimitri-yatsenko

Nah, that just won't be a reasonable interface as just having one entry with meta can disrupt it. We do need to have a clean separation for when meta is returned vs not.

May 28 '20 00:05 eywalker

The separation is clean. If you insert a tuple, you fetch it back. It's simple, does not need to be explained. Users get back what they insert. If they choose to insert some records with metadata and some without, that's what they will get back too — straightforward and transparent.

May 28 '20 00:05 dimitri-yatsenko

It will just be much nicer to be able to fully expect if you are going to get a tuple vs list of strings precisely corresponding to the filepath. Meta provision should really be optional with no chance of disrupting the main usage of obtaining back the filepath.

May 28 '20 00:05 eywalker

there is a clear separation between actual data and metadata, and I find it completely consistent we treat them separately. Let's proceed with fetch_meta based behavior and discuss further as we see the examples.

May 28 '20 00:05 eywalker

Agreed. Yes, the option of skipping the metadata will be helpful.

May 28 '20 00:05 dimitri-yatsenko

Perhaps by default, fetch_meta=None, which means fetch whatever you inserted. fetch_meta=True returns tuples always. fetch_meta=False returns the paths only.

May 28 '20 00:05 dimitri-yatsenko

Hm, potentially. Although I'd really think it's enough to offer True/False behavior defaulting to False.

May 28 '20 00:05 eywalker

Then this would introduce the inconsistency that you insert one thing and fetch another. The default behavior needs to be most consistent.

May 28 '20 00:05 dimitri-yatsenko

You are inserting metadata along with the data, and for that to be treated differently sounds just fine to me. It's not quite the same situation as inserting a tuple and expecting tuple back for a blob.

May 28 '20 00:05 eywalker

Is there a good reason to treat metadata differently? It's all just data. Special behaviors require extra documentation and explanations. Fetching what is inserted is consistent behavior through all other cases. If the user does not like it, they will look for the feature to skip the metadata.

May 28 '20 00:05 dimitri-yatsenko

Here is a more complete example using the custom data type for NWB objects.

class NWBTrace(dj.AttributeAdapter):
    """
    custom datajoint attribute type for NWB objects in NWB files
    """

    attribute_type = 'filepath@store'

    def put(self, nwbobj):
        nwb_path = nwbobj.container_source
        return nwb_path, nwbobj.trace_id_to_store

    def get(self, filepath):  # returned as a tuple: (filepath, meta)
        nwb_path, object_id = filepath
        return pynwb.NWBHDF5IO(nwb_path, mode='r').read()[object_id]


nwb_trace = NWBTrace()


@schema
class Ephys(dj.Manual):
    definition = """
    -> Session
    ---
    trace: <nwb_trace>    
    """

...

Ephys.insert1({**session_key, 'trace': (nwb_filepath, obj_uuid))
trace = (Ephys & session_key).fetch1('trace')

May 28 '20 00:05 dimitri-yatsenko