postgres_scanner
postgres_scanner copied to clipboard
Parallel COPY to Postgres ?
What happens?
Hello, I tried to do a parallel import in Postgres. I wonder if postgres_scanner allows this ?
To Reproduce
I tried :
ATTACH 'user=postgres host=/var/run/postgresql port=5440' AS db (TYPE postgres);
COPY db.votes2 FROM 'data.zstd/*' (FORMAT parquet);
I tried to change file format (csv, parquet), compression, multiples files... I also tried to first import in duckdb, then, copy to Postgres. I only see one COPY process on Postgres.
OS:
Linux
PostgreSQL Version:
16
DuckDB Version:
v0.10.0 20b1486d11
DuckDB Client:
CLI
Full Name:
Adrien Nayrat
Affiliation:
Self Employed
Have you tried this on the latest main branch?
- [X] I agree
Have you tried the steps to reproduce? Do they include all relevant data and configuration? Does the issue you report still appear there?
- [X] I agree
Thanks for the report! Parallel imports are not supported right now. It is also rather hard (and perhaps not even possible?) to support parallel imports in a transaction-safe manner. We could perhaps offer a bulk-import option that does not respect transactions in the future.
Thanks for your quick reply ! Indeed, it is a good idea for the bulk import without taking care of transaction. FYI, it seems possible to perform the copy thanks to snapshot export :
| Session 1 | Session 2 |
|---|---|
| BEGIN; | |
| SELECT pg_export_snapshot(); pg_export_snapshot --------------------- 00000003-00000016-1 (1 row) |
|
copy t1 from 'file.copy' ; COPY 10 |
BEGIN ISOLATION LEVEL REPEATABLE READ; |
| SET TRANSACTION SNAPSHOT '00000003-00000016-1'; | |
| copy t1 from 'file.copy' ; COPY 10 |
|
| COMMIT; | |
| COMMIT; | |
| select count(*) from votes2; count ------- 20 (1 row) |
With such feature, DBT could be a very good tool to load massive amount of data in Postgres. It could be my new favorite ELT/ETL :)