datajoint-python
datajoint-python copied to clipboard
Restriction in table definition
I thought I had read something about this before, but couldn't find an issue about it, so apologies if this is a duplicate.
When defining a table with a foreign key, it sometimes is desired to restrict the possible entries based on the attributes of the foreign table. For example, if a table tracks the birth of litters, with a foreign key into an Animal table representing the mother, we ought to make sure that the mother is female. Currently this is easily achievable in python (or MATLAB) code wrapping around the insert method, and the key_source property helps with this for auto-populated tables, but it would be clarifying to have this in the table definition, especially for manual tables. Something like:
-> Animal & 'sex="F"'
A cursory look suggests the implementation would be relatively simple, involving updating the foreign key parser, storing the restriction under the foreign key in the Dependencies graph (similar to how aliases are stored), updating the Table.parents method to apply the restriction, and performing a check in the Table.insert method against the restriction.
This is loosely related to #324, which discusses including the join operator in the table definition.
Great thinking. Generally, the full DataJoint data model envisions being able to declare foreign keys to any query expressions https://arxiv.org/abs/1807.11104. Currently, datajoint implements projection: -> Animal.proj(subject_id='animal_id') but not other operators yet.
For restrictions, ideally, this would be something that would translate into server-side checks rather than being checked programmatically on the client side. This could be done with CHECK constraints or TRIGGERs. Then it would need to be properly reversed-engineered from the DDL statements back into DataJoint table definition code.
@zfj1 Do you have a compelling use case where this feature becomes particularly useful?
@dimitri-yatsenko thanks for your thoughts. Of course, the manuscript is exactly where I thought I had come across this before. I agree that the server-side implementation is much more ideal.
To answer your question, I suppose it depends a bit on what you find compelling. As I mentioned restricting the valid entries in a table is already achievable by wrapping the insert with explicit checks or defining a restricted key_source, but I think baking this into the table definition would be much more user-friendly and portable.
Here's an example similar to what we're currently using:
::Protocol
protocol_name : varchar(64)
---
uses_visual_stimulus : enum('F','T')
uses_electrophysiology : enum('F','T')
::Epoch
epoch_id : int unsigned
---
-> Protocol
start_time : time
::Epoch.Image
-> Epoch & (Protocol & 'uses_visual_stimulus="T"')
---
image : blob
::Epoch.Electrode
-> Epoch & (Protocol & 'uses_electrophysiology="T"')
---
recording_mode : enum('Current Clamp', 'Voltage Clamp', 'Dynamic Clamp')
In the long term, we plan to implement the low-level functionality centrally in a compiled language or on the server side. See https://github.com/datajoint/datajoint-core for the ongoing effort to shift the low-level functionality into a compiled core shared across all datajoint implementations. For now, we are keeping the python client as thin as possible with only most critical functionality.
The case above indeed is a nice illustration of the principle. The benefit of this restriction is to express and enforce additional integrity constraints in a natural way. It's definitely something that will be implemented in the future. Doing this in the Python client would just be too complex and will reduce performance. The foreign key check must be performed within a transaction. We would also need to come up with a way to store this restriction within the table definition, probably in the table comment or something.
I think for now, we will defer this until we can implement this in a more systematic way on the server side on within the upcoming Datajoint Core.