Parquet files generated from a random process

Open Harsh-Maheshwari opened this issue 6 months ago • 2 comments

Hi I have a process which generates parquet files daily at a specific path in a hive partition format.

I don't want to change this setup. I just want to start monitoring the base of my hive database to be able to add transactions to ducklake such that if I am reading from another process I can just use the dataset version I want.

In short can we decouple the write to parquet and transaction update processes in ducklake ?

Or is it possible to specify the path and parquet file name to use when appending to an already existing table or when creating a new table

Jun 17 '25 16:06 Harsh-Maheshwari

Does this meet your needs? https://github.com/duckdb/ducklake/pull/175

Jun 17 '25 21:06 Tishj

There's also documentation on this - https://ducklake.select/docs/stable/duckdb/metadata/adding_files - but this doesn't yet support adding hive partitioned files

Jun 18 '25 07:06 Mytherin