ducklake_add_data_files does not populate hive-partition virtual columns
I am trying to add existing, hive-partitioned parquet files from an s3 bucket to ducklake. These parquet files were created using duckdb following the documentation:
COPY metrics TO 'metrics'
(FORMAT parquet, PARTITION_BY (year, month,resource_name), OVERWRITE_OR_IGNORE);
now that i'm trying to manage these under ducklake i create the table
--- load catalog stuff here
create table metrics (
collectiondate timestamp
,metricname varchar
,metricvalue float
,"year" int
,"month" int
,resource_name varchar
);
alter table metrics
set partitioned by ("year","month",resource_name);
now ...
CALL ducklake_add_data_files('my_test_catalog'
,'metrics'
,'s3://bucket/warehouse/metrics/year=2025/month=6/resource_name=server/data_0.parquet'
,allow_missing => true
);
loads the file, but when I run
select *
from metrics
year,month, and resource_name are all null.
Excluding the columns from the table ddl fails the partitioning scheme since I cannot partition on a column that does not exist in the table.
edit: ducklake version: d066f91 duckdb version: 1.3.2
Theres already a PR for adding hive partitioned data files https://github.com/duckdb/ducklake/pull/252 , will be merged after duckdb v1.4 is released
@swasheck could you check if the fix of https://github.com/duckdb/ducklake/pull/252 works for you? You can install try installing from core nightly
@guillesd i apologize for the delay as i have been on summer holiday.
yes. the issue persists.
duckdb 1.3.2 ducklake (core_nightly): 10e2b8a
FORCE INSTALL ducklake from core_nightly;
LOAD ducklake;
update extensions; -- ducklake core_nightly NO_UPDATE_AVAILABLE 10e2b8a 10e2b8a
everything else is the same as my initial report.
is there a specific implementation path to follow?
Hi @swasheck this is my fault, core nightly is still not pointing at 0.3-dev. Also, you need DuckDB 1.4-dev to check. We are going to push 0.3-dev to core nightly and you can install 1.4-dev from here. I did test this fix on my side and it works!