datajoint-python icon indicating copy to clipboard operation
datajoint-python copied to clipboard

Deprecate using timestamps as primary keys

Open zhenruiliao opened this issue 1 year ago • 1 comments

Feature Request

Problem

The tutorial appears to encourage the use of timestamps as primary keys. However, this is widely seen as an antipattern in the databases community because of the chance of collisions. This becomes very hard to debug if inserts are coming from multiple threads or even multiple machines, or if the user has skip_duplicates=True set leading to a small number of entries simply being silently dropped.

Even when used in composite keys, ambiguity is possible as the user may rely on the timestamp to provide uniqueness and start duplicating other primary keys, or they may use the timestamp key in constraints, leading to the wrong rows being selected in another table.

Requirements

Update the tutorial to caution users against the use of a timestamp as a primary key. In a future release, remove support for timestamps, or at least the CURRENT_TIMESTAMP default value in primary keys

Related Errors

Will cause DuplicateError if two inserts happen within the time resolution of the timestamp

Additional Research and Context

https://dba.stackexchange.com/questions/168356/is-using-timestamp-as-primary-key-good-idea-mysql-mariadb https://dba.stackexchange.com/questions/214110/can-current-timestamp-be-used-as-a-primary-key https://dba.stackexchange.com/questions/213899/is-there-a-strategy-to-use-timestamp-as-primary-key

zhenruiliao avatar Jul 10 '22 15:07 zhenruiliao

These are good points but I would not go as far as deprecate the use of timestamps as part of a primary key. There are many cases where it can be appropriate. We can make recommendations as to what makes a good primary key.

We have considered prohibiting default values for primary key attributes altogether, since the explicit identification of the inserted entities is a key component of entity integrity. However, we regularly run into situations where such use can be justified. This goes for auto_increment values, random uuids, and timestamps. We will continue improving the documentation and presentation of principles of data integrity.

dimitri-yatsenko avatar Aug 05 '22 15:08 dimitri-yatsenko