spyglass
spyglass copied to clipboard
Proposal for tracking row-based provenance
For common tables, users may accidentally (or intentionally) modify a row created by another user. So it is useful to know who was the last person who modified a row and when. Similarly, it is useful to know who created the row in the first place.
A couple types of actions (declare table, drop table, delete from table) are tracked in the DataJoint hidden ~log
table that is part of each database (e.g., common_nwbfile
), but those are limited to table-wide actions and deletions, and it may be hard to parse for everyday users.
There are many ways to do this, but since we care only about the latest state of each row, one common way seems to be to add columns to the data tables about the latest state:
-
created_by
orcreation_user
(user) (with or without IP address) -
created_on
orcreated_time
orcreation_timestamp
(datetime) -
last_modified_by
(user) -
last_modified_on
orlast_modified_time
orlast_modification_timestamp
(datetime) -
last_modified_reason
(string, e.g., "database migration" or "fix typo") (not sure if useful) -
last_modified_spyglass_version
(string) (not sure if useful) - (we could also add fields like
spyglass_environment
- a text-based dump of the virtual environment, host name, and operating system, but I think that is overkill and would result in a bloated database. we have thought about this for NWB and are settling on version of the primary software package as being good enough for our initial attempt at tracking provenance)
This is now doable in the latest version of DataJoint (not released yet?) without cluttering the common display of tables (i.e., the columns are "hidden" but can be queried/displayed).
Alternative: we could create a history table for every data table and add a foreign key from the data table to the history table. I'm not sure if this separation adds any value now that we can have hidden columns.
Concern: Both approaches will increase the size of the database. Is it worthwhile?
It would be nice if this were built into DataJoint as mysql triggers, but until then, we could add the values every time we call populate
.
@CBroz1 noted:
I think it would require the mixin intercepting the definition attr before table declaration and appending new lines. There would also be some migration effort of altering all existing tables
Questions:
- Is this enhancement worth making? How often are tables modified in a way that it is useful to track who/when/why a change was made?
- If yes, what do you think about the approach?