Parquet files generated from a random process
Hi I have a process which generates parquet files daily at a specific path in a hive partition format.
I don't want to change this setup. I just want to start monitoring the base of my hive database to be able to add transactions to ducklake such that if I am reading from another process I can just use the dataset version I want.
In short can we decouple the write to parquet and transaction update processes in ducklake ?
Or is it possible to specify the path and parquet file name to use when appending to an already existing table or when creating a new table
Does this meet your needs? https://github.com/duckdb/ducklake/pull/175
There's also documentation on this - https://ducklake.select/docs/stable/duckdb/metadata/adding_files - but this doesn't yet support adding hive partitioned files