datajoint-python icon indicating copy to clipboard operation
datajoint-python copied to clipboard

Advanced handling of duplicates in `insert1`

Open dimitri-yatsenko opened this issue 3 years ago • 3 comments

Feature Request

Allow new ways of handling different types of duplicates in insert1

Problem

Currently, there is only one way to skip inserts skip_duplicates=True ignores all duplicates, including primary key or secondary unique indexes. There are cases, however, when only specific types of duplicates should be skipped.

Requirements

Condition 1. For a primary duplicate, we need an option to ignore the duplicate only if the entry matches on all the secondary unique indexes. This is helpful for tables that map unique indexes between two identification systems.

Condition 2. For a secondary duplicate, it may be helpful to include in the error message the primary key of the duplicate entry already in the database.

Both conditions will require a second query and only apply to insert1 rather than insert.

This could be addressed by allowing other values besides True or False for the skip_duplicates argument in insert1. Considering that both conditions should probably appear together, we can name this option "match":

table.insert1(entry, skip_duplicates='match')

Alternative Considerations

This feature was discussed for addressing: https://github.com/datajoint/element-interface/issues/42

Potentially, we could implement this feature in element-interface as a general datajoint utility but not part of datajoint itself. This depends on how clear and common the functionality is.

dimitri-yatsenko avatar Sep 01 '22 00:09 dimitri-yatsenko

Hi @dimitri-yatsenko, I'm interested to contribute. I have idea about docker, MYSQL and django. Can you please assign me this issue. It'll be of great help for me as a first time contributor.

Thank you

MadhuMPandurangi avatar Sep 08 '22 06:09 MadhuMPandurangi

@MadhuMPandurangi Thank you for your interest in contributing. The DataJoint team already has ongoing developments to address this issue and we expect to release a fix shortly.

However, you are welcome to suggest your implementation and issue a PR. The DataJoint team will provide detailed and timely feedback. We will merge the optimal features of both solutions.

dimitri-yatsenko avatar Sep 12 '22 14:09 dimitri-yatsenko

Thanks for your interest @MadhuMPandurangi! We always appreciate any help you can provide. :smiley:

We welcome all PR's but I might suggest having a look at these good-first-issues. We've recently updated them and they should reflect some easier ones to get started.

Please let me know if any of those catch your eye and I can assign them to you.

guzman-raphael avatar Sep 13 '22 23:09 guzman-raphael