datapusher-plus
datapusher-plus copied to clipboard
Fast upsert mode
Currently, DP+ like Datapusher and xloader, only does drop & replace and doesn't do upserts.
It'd be great if DP+ can support upserts in a performant way.
This can be done by:
- adding a resource-level metadata field that the Data Publisher can set to enable upsert mode.
- when a resource has upsert mode enabled, instead of drop & replace, DP+ will:
- compare the schemas of the existing resource and the new CSV to see if they are identical (qsv can do this very quickly)
- if they're not, DP+ will abort stating that the resource is in upsert mode and the schemas do not match
- if the schemas are identical, do a PostgreSQL copy to a temporary table of the file to be pushed
- then do a INSERT INTO ON CONFLICT DO UPDATE to upsert the temporary table into the existing resource
- the temporary table is then deleted