remove duplicates feature
I have a table with a lot of entries (+440,000).
However, I have a lot of duplicates (a duplicate can be defined when the same column in 2 rows is identical). And I can't see how I can simply delete them from the interface? Here's an example of the SQL query I had to make when connecting to the database (in the docker container). :
WITH duplicates AS (
SELECT
ctid,
ROW_NUMBER() OVER (
PARTITION BY “column_name”
ORDER BY ctid
) AS rn
FROM “schema_name”. “table_name”
)
DELETE FROM “schema_name”. “table_name”
WHERE ctid IN (
SELECT ctid
FROM duplicates
WHERE rn > 1
);
If this feature is added at the interface level when you're on a grid, there could be a sort of 'delete duplicates' button with a column to be defined. I don't know if the scope of this feature stops there, or if it encompasses other possibilities for deleting duplicates.
Removing duplicate content involves different strategies and is a "write" operation. It is better to implement this requirement through plugins.
We are perfecting the infrastructure of the plug-in system to support such requirements flexibly
To address this , you can consider developing a plugin that identifies and removes duplicate records. Here's a high-level approach:
Identify Unique Constraints: Determine the fields that define a unique record in your dataset.
Scan for Duplicates: Implement logic to scan the dataset for records that have identical values in the unique fields.
Remove Duplicates: Once duplicates are identified, the plugin can remove the redundant records, retaining only one instance of each unique record.
User Confirmation: It's advisable to include a confirmation step before deletion to prevent accidental data loss.
Plugin Implementation:
Since Teable supports plugin development, you can create a custom plugin following their plugin development guidelines. This plugin can be tailored to your specific requirements, including defining what constitutes a duplicate and how duplicates should be handled.
Alternative Approaches:
If developing a plugin is not feasible, consider exporting the data, processing it externally to remove duplicates using tools like Python or Excel, and then re-importing the cleaned data back into Teable.
Conclusion:
While Teable does not currently offer a built-in "remove duplicates" feature, the platform's extensibility through plugins provides a pathway to implement this functionality according to your specific needs.
@romainds-tech # #