ducklake icon indicating copy to clipboard operation
ducklake copied to clipboard

ducklake_add_data_files does not populate hive-partition virtual columns

Open swasheck opened this issue 5 months ago • 4 comments

I am trying to add existing, hive-partitioned parquet files from an s3 bucket to ducklake. These parquet files were created using duckdb following the documentation:

COPY metrics TO 'metrics'
(FORMAT parquet, PARTITION_BY (year, month,resource_name), OVERWRITE_OR_IGNORE);

now that i'm trying to manage these under ducklake i create the table

--- load catalog stuff here

create table metrics (
collectiondate timestamp
,metricname varchar
,metricvalue float
,"year" int 
,"month" int 
,resource_name varchar
);

alter table metrics 
  set partitioned by ("year","month",resource_name);

now ...

CALL ducklake_add_data_files('my_test_catalog'
	,'metrics'
	,'s3://bucket/warehouse/metrics/year=2025/month=6/resource_name=server/data_0.parquet'
	,allow_missing => true
);

loads the file, but when I run

select * 
from metrics 

year,month, and resource_name are all null.

Excluding the columns from the table ddl fails the partitioning scheme since I cannot partition on a column that does not exist in the table.

edit: ducklake version: d066f91 duckdb version: 1.3.2

swasheck avatar Jul 19 '25 04:07 swasheck

Theres already a PR for adding hive partitioned data files https://github.com/duckdb/ducklake/pull/252 , will be merged after duckdb v1.4 is released

gahtan-syarif avatar Jul 19 '25 09:07 gahtan-syarif

@swasheck could you check if the fix of https://github.com/duckdb/ducklake/pull/252 works for you? You can install try installing from core nightly

guillesd avatar Aug 04 '25 07:08 guillesd

@guillesd i apologize for the delay as i have been on summer holiday.

yes. the issue persists.

duckdb 1.3.2 ducklake (core_nightly): 10e2b8a

FORCE INSTALL ducklake from core_nightly;
LOAD ducklake;
update extensions; -- ducklake	core_nightly	NO_UPDATE_AVAILABLE	10e2b8a	10e2b8a

everything else is the same as my initial report.

is there a specific implementation path to follow?

swasheck avatar Aug 19 '25 19:08 swasheck

Hi @swasheck this is my fault, core nightly is still not pointing at 0.3-dev. Also, you need DuckDB 1.4-dev to check. We are going to push 0.3-dev to core nightly and you can install 1.4-dev from here. I did test this fix on my side and it works!

guillesd avatar Aug 20 '25 12:08 guillesd