pg_duckdb Case insensitivity mismatch between duckdb and postgres can cause issues

trafficstars

DuckDB considers identifiers with different casing the same even if they are quoted.

The most minimal example of this is the following schema:

CREATE TABLE a(b int);
CREATE TABLE "A"(c int);
INSERT INTO a VALUES(1);

When using postgres execution this correctly outputs:

> SELECT * FROM a, "A" as aa;
 b │ c
───┼───
(0 rows)

But when enabling duckdb execution both a and "A" are interpreted as a. Resulting in the following incorrect result:

> SELECT * FROM a, "A" as aa;
 b │ b
───┼───
 1 │ 1
(1 row)

This happens even when I set preserver_identifier_case to true on the DuckDB connection.

Aug 13 '24 16:08 JelteF

Good find. Some related discussion here: https://github.com/duckdb/pg_duckdb/issues/56#issuecomment-2270785294, and related to #43 -- though these are not specific to case insensitivity.

Aug 13 '24 16:08 wuputah

This may be related, but in either of the two below, the field name cases aren't maintained in the new postgres table. Using column names in double quotes instead of a wildcard also does not resolve the issue

CREATE TABLE postgres_staging.public.{table_name} AS SELECT * FROM postgres_prod.public.{table_name}

duckdb.sql(f'CREATE TABLE {table_name} AS SELECT * FROM postgres_prod.public.{table_name}')
duckdb.sql(f'CREATE TABLE postgres_staging.public.{table_name} AS SELECT * FROM {table_name}')

Aug 25 '24 21:08 Alzavio

This shouldn't be in the long term milestone. but an important bug to fix.

Oct 29 '24 17:10 pantonis

@pantonis we'll try to address this asap.

Oct 30 '24 08:10 Y--

Based on some research I did this should be quite easy to address, postgres_scanner is able to do this. I'm pretty sure it's a few lines of code only. So we can use the same approach as is used there. Let's wait until #477 is merged though, to avoid any merge conflicts.

Dec 18 '24 15:12 JelteF

We had the same issue when we were using Hydra and it was fixed quite fast I can say.

Dec 18 '24 16:12 pantonis

This seems to be solved in the current version.

We had the same issue when we were using Hydra and it was fixed quite fast I can say.

Dec 19 '24 01:12 Z-Xiao-M

If it's indeed resolved we should add some tests to make sure we don't regress again.

Dec 19 '24 14:12 JelteF

Not sure if it's related, but it seems you can't change the casing of the names with as if the name is upper when reading from parquet.

create a parquet, like

duckdb -c "COPY (select generate_series as \"UPPER\", generate_series as \"lower\" FROM generate_series(100)) TO '/tmp/tmp.parquet' (FORMAT 'parquet')"

then from the postgresql

select * from read_parquet('/tmp/tmp.parquet') as t(upper int8, lower int8) limit 2;
 UPPER | lower
-------+-------
     0 |     0
     1 |     1
(2 rows)

select upper as "upper", lower as "LOWER" from read_parquet('/tmp/tmp.parquet') as t(upper int8, lower int8) limit 2;
 UPPER | LOWER
-------+-------
     0 |     0
     1 |     1
(2 rows)

select upper as "upper_", lower as "LOWER_" from read_parquet('/tmp/tmp.parquet') as t(upper int8, lower int8) limit 2;
 upper_ | LOWER_
--------+--------
      0 |      0
      1 |      1
(2 rows)

can't rename UPPER to lowercase. If you change the name it works fine.

interestingly it only happens UPPERCASE to lowercase, not the opposite as shown in the lower column

let me know if I should open a new issue

Jan 28 '25 17:01 CaselIT

@CaselIT can you try that new again with the current main branch? #531 might very well change the behavior you're describing.

Jan 28 '25 17:01 JelteF

just tried on docker.io/pgduckdb/pgduckdb:17-main and I get the same result. I originally were using 0.2

Jan 28 '25 17:01 CaselIT

Alright, can you create a separate issue for this?

Jan 28 '25 18:01 JelteF

@JelteF ok, I've crated #564

Jan 28 '25 18:01 CaselIT

pg_duckdb pg_duckdb copied to clipboard

Case insensitivity mismatch between duckdb and postgres can cause issues

pg_duckdb
pg_duckdb copied to clipboard