mathesar
mathesar copied to clipboard
Update design for checking for duplicate rows.
Problem
We have implemented a filter to identify duplicate rows. However, after implementing it, we've realized that it doesn't make sense as a filter like our other filters.
Our other filters work the same regardless of the order in which they are applied. This is not the case for "has duplicates". Depending on the other filters applied, we may end up returning different results and confusing the user. We don't face this problem with any other filter because other filters only rely on the data in a specific row. "Has duplicates" relies on all of the rows visible.
As an example, here's a table:
ID | Title | Release Year | Favorite |
---|---|---|---|
1 | Dante's Peak | 1997 | FALSE |
2 | Forrest Gump | 1994 | TRUE |
3 | The Karate Kid | 1984 | TRUE |
4 | Dante's Peak | 1997 | TRUE |
5 | The Karate Kid | 1984 | TRUE |
Order 1
Imagine the user applies filters in this order: "Year" > 1993
, "Favorite" is TRUE
, "Title" has duplicates
Filter 1: "Year" > 1993
ID | Title | Release Year | Favorite |
---|---|---|---|
1 | Dante's Peak | 1997 | FALSE |
2 | Forrest Gump | 1994 | TRUE |
4 | Dante's Peak | 1997 | TRUE |
Filter 2: "Favorite" is TRUE
ID | Title | Release Year | Favorite |
---|---|---|---|
2 | Forrest Gump | 1994 | TRUE |
4 | Dante's Peak | 1997 | TRUE |
Filter 3:"Title" has duplicates
0 results.
Order 2
But instead if the user applies filter in this order "Year" > 1993
,"Title" has duplicates
and "Favorite" is TRUE
Filter 1: "Year" > 1993
ID | Title | Release Year | Favorite |
---|---|---|---|
1 | Dante's Peak | 1997 | FALSE |
2 | Forrest Gump | 1994 | TRUE |
4 | Dante's Peak | 1997 | TRUE |
Filter 2:"Title" has duplicates
ID | Title | Release Year | Favorite |
---|---|---|---|
1 | Dante's Peak | 1997 | FALSE |
4 | Dante's Peak | 1997 | TRUE |
Filter 3: "Favorite" is TRUE
ID | Title | Release Year | Favorite |
---|---|---|---|
4 | Dante's Peak | 1997 | TRUE |
Proposed solution
A few ideas:
- If we want "has duplicates" to be a filter, we need to introduce the concept of ordering and reordering filters and explaining to the user how their previous filters affect future filters. This seems like a complicated concept to introduce for the alpha release so I think it would be better to avoid this solution.
- We could move the functionality to check for duplicates to some other part of the table/view. This could be extended to other functionality in the future – we could support more complicated database queries through this interface.
- We could move the functionality to the Query Builder (see #1065)
- Please note that this needs to be ordered with other steps in the query builder to work properly.
Design notes
- Please note that this also involves updating the user flow when the user tries to set a UNIQUE constraint on a column and they have non-unique values to be consistent with the new design.
Additional context
- "Filters" Concepts page
- Technical discussion:
- #846
- Recent meeting notes
- #853
I'll reiterate how I see a transformation like duplicates-only working in the distant future.
In general, to see only duplicates, you have to perform these steps:
- Look at which rows are unique on some column
- Group them together
- Introduce a new column containing the group-size of that row
- Filter based on whether the group-size is more than 1
At the moment, we are encapsulating all of the above steps into a single monolithic transformation and calling it duplicates_only
.
What I'd like to see happening in the future is that we give the user the general tools to perform above listed operations and he can compose them into the exact above sequence and get the same result. That does require/imply this pipeline-type interface that we've been talking about lately.
This is blocked by #1065 since we'll probably be using the query builder to identify duplicate rows.
I'm marking this as unblocked now that #1065 is completed.
Closing this, too old, requirements are likely out of date. We can create a new issue if we need this functionality in the future.