postgres_scanner icon indicating copy to clipboard operation
postgres_scanner copied to clipboard

Inconsistent Column Name Quoting in DuckDB with PostgreSQL Integration

Open camargocr opened this issue 7 months ago • 2 comments

What happens?

When using the DuckDB package in Python to import a CSV file into PostgreSQL, I've observed inconsistent behavior regarding column names:

  • Lower Case Column Titles: if the CSV file's header (column titles) is in lower case, the corresponding column names in the created PostgreSQL table appear without quotes.

  • Upper Case Column Titles: if the CSV file's header is in upper case, the corresponding column names in the PostgreSQL table are enclosed in quotes, even when the preserve_identifier_case setting is set to False.

This behavior appears inconsistent and may indicate a bug. Below is the Python script used for testing this issue.

To Reproduce


#!/usr/bin/env python
# coding: utf-8

import duckdb as db

con = db.connect()
con.install_extension("postgres")
con.load_extension("postgres")
con.execute("set preserve_identifier_case = false;")


sql_cmd = """
attach 'host=localhost 
        port=5432 
        dbname=db
        user=jonhdoe 
        password=marydoe' as pg (type postgres);
"""
con.sql(sql_cmd)

sql_cmd = """
drop table if exists pg.list;
create table pg.list_lower as 
select * from 'test_lower.csv';
"""
con.sql(sql_cmd)

sql_cmd = """
drop table if exists pg.list;
create table pg.list_upper as 
select * from 'test_upper.csv';
"""
con.sql(sql_cmd)

con.close()
CUST_ID,START_DATE,END_DATE,TRANS_ID,DATE,YEAR,MONTH,DAY,EXP_TYPE,AMOUNT
CI6XLYUMQK,2015-05-01,,T8I9ZB5A6X90UG8,2015-09-11,2015,9,11,Motor/Travel,20.27
CI6XLYUMQK,2015-05-01,,TZ4JSLS7SC7FO9H,2017-02-08,2017,2,8,Motor/Travel,12.85
CI6XLYUMQK,2015-05-01,,TTUKRDDJ6B6F42H,2015-08-01,2015,8,1,Housing,383.8
CI6XLYUMQK,2015-05-01,,TDUHFRUKGPPI6HD,2019-03-16,2019,3,16,Entertainment,5.72
CI6XLYUMQK,2015-05-01,,T0JBZHBMSVRFMMD,2015-05-15,2015,5,15,Entertainment,11.06
CI6XLYUMQK,2015-05-01,,TZBO76X87ZUWBJ0,2016-02-22,2016,2,22,Motor/Travel,20.37
CI6XLYUMQK,2015-05-01,,T8YEYMHF208963N,2015-08-20,2015,8,20,Entertainment,15.97
CI6XLYUMQK,2015-05-01,,TQNUZKC8K4W0S9H,2016-07-16,2016,7,16,Entertainment,17.46
CI6XLYUMQK,2015-05-01,,TQJ3BTAB7E449ZN,2018-12-23,2018,12,23,Groceries,26.77
CI6XLYUMQK,2015-05-01,,TMFITCZBN0YK9FM,2020-12-02,2020,12,2,Entertainment,8.24
CI6XLYUMQK,2015-05-01,,TEDY6MR01ELTHWY,2018-02-25,2018,2,25,Groceries,9.27
CI6XLYUMQK,2015-05-01,,TRUQRD6ME4A13I0,2019-04-15,2019,4,15,Groceries,8.97
CI6XLYUMQK,2015-05-01,,T13LUKGW0O8SAIQ,2017-01-20,2017,1,20,Motor/Travel,23.95
CI6XLYUMQK,2015-05-01,,T8DBBZQXIR0X93O,2015-06-14,2015,6,14,Groceries,8.24
CI6XLYUMQK,2015-05-01,,T4H2B228GNWC4ES,2018-03-27,2018,3,27,Entertainment,9.58
CI6XLYUMQK,2015-05-01,,T0TGOW0T3504WJ3,2020-05-06,2020,5,6,Groceries,17.44
cust_id,start_date,end_date,trans_id,date,year,month,day,exp_type,amount
CI6XLYUMQK,2015-05-01,,T8I9ZB5A6X90UG8,2015-09-11,2015,9,11,Motor/Travel,20.27
CI6XLYUMQK,2015-05-01,,TZ4JSLS7SC7FO9H,2017-02-08,2017,2,8,Motor/Travel,12.85
CI6XLYUMQK,2015-05-01,,TTUKRDDJ6B6F42H,2015-08-01,2015,8,1,Housing,383.8
CI6XLYUMQK,2015-05-01,,TDUHFRUKGPPI6HD,2019-03-16,2019,3,16,Entertainment,5.72
CI6XLYUMQK,2015-05-01,,T0JBZHBMSVRFMMD,2015-05-15,2015,5,15,Entertainment,11.06
CI6XLYUMQK,2015-05-01,,TZBO76X87ZUWBJ0,2016-02-22,2016,2,22,Motor/Travel,20.37
CI6XLYUMQK,2015-05-01,,T8YEYMHF208963N,2015-08-20,2015,8,20,Entertainment,15.97
CI6XLYUMQK,2015-05-01,,TQNUZKC8K4W0S9H,2016-07-16,2016,7,16,Entertainment,17.46
CI6XLYUMQK,2015-05-01,,TQJ3BTAB7E449ZN,2018-12-23,2018,12,23,Groceries,26.77
CI6XLYUMQK,2015-05-01,,TMFITCZBN0YK9FM,2020-12-02,2020,12,2,Entertainment,8.24
CI6XLYUMQK,2015-05-01,,TEDY6MR01ELTHWY,2018-02-25,2018,2,25,Groceries,9.27
CI6XLYUMQK,2015-05-01,,TRUQRD6ME4A13I0,2019-04-15,2019,4,15,Groceries,8.97
CI6XLYUMQK,2015-05-01,,T13LUKGW0O8SAIQ,2017-01-20,2017,1,20,Motor/Travel,23.95
CI6XLYUMQK,2015-05-01,,T8DBBZQXIR0X93O,2015-06-14,2015,6,14,Groceries,8.24
CI6XLYUMQK,2015-05-01,,T4H2B228GNWC4ES,2018-03-27,2018,3,27,Entertainment,9.58
CI6XLYUMQK,2015-05-01,,T0TGOW0T3504WJ3,2020-05-06,2020,5,6,Groceries,17.44

OS:

MacOS X sequoia 15.5 M2 aarch64

DuckDB Version:

v1.2.2 7c039464e4

DuckDB Client:

1.3.0

Hardware:

M2

Full Name:

Carlos Rogerio Camargo

Affiliation:

Universidade Federal de Santa Catarina

What is the latest build you tested with? If possible, we recommend testing with the latest nightly build.

I have tested with a stable release

Did you include all relevant data sets for reproducing the issue?

Yes

Did you include all code required to reproduce the issue?

  • [x] Yes, I have

Did you include all relevant configuration (e.g., CPU architecture, Python version, Linux distribution) to reproduce the issue?

  • [x] Yes, I have

camargocr avatar May 28 '25 12:05 camargocr

Thanks for the report!

This seems like intended behavior.

To clarify, what preserve_identifier_case does is if you use an unquoted identifier in SQL it will convert it to lowercase. This only applies to the SQL layer. It does not affect names read from other sources, such as from CSV files or tables.

Without quotes Postgres converts identifiers to lowercase automatically, which would lose casing information. As such quotes are added for non-lowercase identifiers.

It seems like what you want is a setting in the Postgres client that will ignore casing when converting data from DuckDB to Postgres. Perhaps you can make a feature request in the duckdb-postgres repo for that feature?

Mytherin avatar May 28 '25 12:05 Mytherin

Hi Mytherin, thanks for the reply.

You're probably right -- maybe it's intentional behavior, but I'm still not sure. The external source -- the CSV, in this case --has no quotes in either scenario (and the column names don't contain any spaces), so I figured the resulting table wouldn’t have quotes. But, ok, I get your reasoning.

Thanks a lot!

camargocr avatar May 28 '25 18:05 camargocr