bety icon indicating copy to clipboard operation
bety copied to clipboard

Create generic function for updating / removing duplicates

Open dlebauer opened this issue 9 years ago • 6 comments

When implementing uniqueness constraints, duplicate records arise

Need a function (in sql or r) that will do the following:

  • identify dependent tables
  • make all referents point to same record
  • delete duplicate

The arguments to the function would be tablename, record and duplicate_record

To identify all referring tables:

SELECT
    TABLE_NAME
FROM
    information_schema. COLUMNS
WHERE
    table_schema = 'public'
AND COLUMN_NAME = 'citation_id'
AND TABLE_NAME NOT LIKE '%view%';

Here is some pseudo-code

referents <- select table_name from ... (see above)

function(table, id, duplicate_id){
  fk <- paste(table, '_id')
  for (referent in referents)
    update referent set  fk = id where fk = duplicate_id
  delete from table where id = duplicate_id

dlebauer avatar Jan 26 '15 17:01 dlebauer