datajoint-python
datajoint-python copied to clipboard
Deprecate using timestamps as primary keys
Feature Request
Problem
The tutorial appears to encourage the use of timestamps as primary keys. However, this is widely seen as an antipattern in the databases community because of the chance of collisions. This becomes very hard to debug if inserts are coming from multiple threads or even multiple machines, or if the user has skip_duplicates=True
set leading to a small number of entries simply being silently dropped.
Even when used in composite keys, ambiguity is possible as the user may rely on the timestamp to provide uniqueness and start duplicating other primary keys, or they may use the timestamp key in constraints, leading to the wrong rows being selected in another table.
Requirements
Update the tutorial to caution users against the use of a timestamp as a primary key. In a future release, remove support for timestamps, or at least the CURRENT_TIMESTAMP
default value in primary keys
Related Errors
Will cause DuplicateError
if two inserts happen within the time resolution of the timestamp
Additional Research and Context
https://dba.stackexchange.com/questions/168356/is-using-timestamp-as-primary-key-good-idea-mysql-mariadb https://dba.stackexchange.com/questions/214110/can-current-timestamp-be-used-as-a-primary-key https://dba.stackexchange.com/questions/213899/is-there-a-strategy-to-use-timestamp-as-primary-key
These are good points but I would not go as far as deprecate the use of timestamps as part of a primary key. There are many cases where it can be appropriate. We can make recommendations as to what makes a good primary key.
We have considered prohibiting default values for primary key attributes altogether, since the explicit identification of the inserted entities is a key component of entity integrity. However, we regularly run into situations where such use can be justified. This goes for auto_increment values, random uuids, and timestamps. We will continue improving the documentation and presentation of principles of data integrity.