teable icon indicating copy to clipboard operation
teable copied to clipboard

remove duplicates feature

Open romainds-tech opened this issue 10 months ago • 2 comments

I have a table with a lot of entries (+440,000).

However, I have a lot of duplicates (a duplicate can be defined when the same column in 2 rows is identical). And I can't see how I can simply delete them from the interface? Here's an example of the SQL query I had to make when connecting to the database (in the docker container). :

WITH duplicates AS (
  SELECT 
    ctid,
    ROW_NUMBER() OVER (
      PARTITION BY “column_name”
      ORDER BY ctid
    ) AS rn
  FROM “schema_name”. “table_name”
)
DELETE FROM “schema_name”. “table_name”
WHERE ctid IN (
  SELECT ctid
  FROM duplicates
  WHERE rn > 1
);

If this feature is added at the interface level when you're on a grid, there could be a sort of 'delete duplicates' button with a column to be defined. I don't know if the scope of this feature stops there, or if it encompasses other possibilities for deleting duplicates.

romainds-tech avatar Feb 13 '25 12:02 romainds-tech

Removing duplicate content involves different strategies and is a "write" operation. It is better to implement this requirement through plugins.

We are perfecting the infrastructure of the plug-in system to support such requirements flexibly

tea-artist avatar Feb 14 '25 04:02 tea-artist

To address this , you can consider developing a plugin that identifies and removes duplicate records. Here's a high-level approach:

Identify Unique Constraints: Determine the fields that define a unique record in your dataset.​

Scan for Duplicates: Implement logic to scan the dataset for records that have identical values in the unique fields.​

Remove Duplicates: Once duplicates are identified, the plugin can remove the redundant records, retaining only one instance of each unique record.​

User Confirmation: It's advisable to include a confirmation step before deletion to prevent accidental data loss.​

Plugin Implementation:

Since Teable supports plugin development, you can create a custom plugin following their plugin development guidelines. This plugin can be tailored to your specific requirements, including defining what constitutes a duplicate and how duplicates should be handled.​

Alternative Approaches:

If developing a plugin is not feasible, consider exporting the data, processing it externally to remove duplicates using tools like Python or Excel, and then re-importing the cleaned data back into Teable.

Conclusion:

While Teable does not currently offer a built-in "remove duplicates" feature, the platform's extensibility through plugins provides a pathway to implement this functionality according to your specific needs.

@romainds-tech # #

QuantumAlchemist03 avatar May 01 '25 19:05 QuantumAlchemist03