postgres_scanner
postgres_scanner copied to clipboard
Postgres Writer dropping data - pushing incomplete data from parquet
What happens?
I have a simple parquet file with two columns (types - bigint and varchar[] -in postgres, INT64 and BYTE_ARRAY in parquet)
When I try to write the data to postgres using the postgres connector, there is data loss happening and not all of the data is making it to postgres. I am able to successfully able to query the parquet in duckdb itself. (Even the csv export works well)
To Reproduce
ATTACH 'dbname=<dbname> port=<port> user=<user> host=<host> password=<pass>' AS db (TYPE POSTGRES);
SELECT * FROM 'https://github.com/arpit94/duckdb/raw/main/data/parquet-testing/npi.parquet' where npi = 1003000126;
┌────────────┬────────────────────┐
│ npi │ primary_taxo_codes │
│ int64 │ varchar[] │
├────────────┼────────────────────┤
│ 1003000126 │ [207R00000X] │
└────────────┴────────────────────┘
CREATE OR REPLACE TABLE db.public.my_table as FROM 'https://github.com/arpit94/duckdb/raw/main/data/parquet-testing/npi.parquet';
SELECT * FROM db.public.my_table where npi = 1003000126;
┌────────────┬────────────────────┐
│ npi │ primary_taxo_codes │
│ int64 │ varchar[] │
├────────────┼────────────────────┤
│ 1003000126 │ │
└────────────┴────────────────────┘
The same thing works with csv format
COPY (SELECT * FROM 'https://github.com/arpit94/duckdb/raw/main/data/parquet-testing/npi.parquet') TO 'output.csv' (HEADER, DELIMITER ',');
SELECT * FROM 'output.csv' WHERE npi = 1003000126;
┌────────────┬────────────────────┐
│ npi │ primary_taxo_codes │
│ int64 │ varchar │
├────────────┼────────────────────┤
│ 1003000126 │ [207R00000X] │
└────────────┴────────────────────┘
OS:
Ubuntu
DuckDB Version:
1.0.0
DuckDB Client:
CLI tool
Full Name:
Arpit Aggarwal
Affiliation:
Candor Health
What is the latest build you tested with? If possible, we recommend testing with the latest nightly build.
I have tested with a stable release
Did you include all relevant data sets for reproducing the issue?
Yes
Did you include all code required to reproduce the issue?
- [X] Yes, I have
Did you include all relevant configuration (e.g., CPU architecture, Python version, Linux distribution) to reproduce the issue?
- [X] Yes, I have