datapusher-plus icon indicating copy to clipboard operation
datapusher-plus copied to clipboard

Fast upsert mode

Open jqnatividad opened this issue 2 years ago • 0 comments

Currently, DP+ like Datapusher and xloader, only does drop & replace and doesn't do upserts.

It'd be great if DP+ can support upserts in a performant way.

This can be done by:

  • adding a resource-level metadata field that the Data Publisher can set to enable upsert mode.
  • when a resource has upsert mode enabled, instead of drop & replace, DP+ will:
    • compare the schemas of the existing resource and the new CSV to see if they are identical (qsv can do this very quickly)
    • if they're not, DP+ will abort stating that the resource is in upsert mode and the schemas do not match
    • if the schemas are identical, do a PostgreSQL copy to a temporary table of the file to be pushed
    • then do a INSERT INTO ON CONFLICT DO UPDATE to upsert the temporary table into the existing resource
    • the temporary table is then deleted

jqnatividad avatar May 04 '22 22:05 jqnatividad