delta-rs
delta-rs copied to clipboard
DeltaConfigKey values not being used (Rust)
Environment
Delta-rs version: 0.16.2
Binding: Rust deltalake crate
Environment: Windows, Windows 11
Bug
What happened: I have this function in my code to create a new delta table:
async fn create_initialized_table(table_path: &str) -> DeltaTable {
DeltaOps::try_from_uri(table_path)
.await
.unwrap()
.create()
.with_columns(RecordStruct::columns())
.with_partition_columns(["id_value".to_string()])
.with_configuration_property(DeltaConfigKey::AutoOptimizeAutoCompact, Some("true"))
.with_configuration_property(DeltaConfigKey::AutoOptimizeOptimizeWrite, Some("true"))
.with_configuration_property(DeltaConfigKey::MinWriterVersion, Some("2"))
.await
.unwrap()
}
What you expected to happen: In the initial delta log file after creation of the delta table, it should specify that the "minWriterVersion" is 2. But instead it actually shows "1". Notably, it also doesn't seem to compact/optimize writes, as if I make sequential writes to the table on the same partition, they show up as many small parquet files instead of consolidating into fewer large parquet files (which is faster to read). I may be misunderstanding how to perform the optimization/repartitioning to condense many small parquet files into a single larger file. But if not, it seems like none of these configuration properties are being propagated?
How to reproduce it: Run the code as above with:
let table_uri = "delta_table_path";
let mut table = create_initialized_table(&table_uri).await;
I'll look into this
@nholt01 I was unable to reproduce the issue on MacOS (I don't have access to Windows machine), maybe someone else can try. I get this in the table metadata using your code (with the small difference - I used my own table schema) "configuration":{"delta.autoOptimize.optimizeWrite":"true","delta.autoOptimize.autoCompact":"true","delta.minWriterVersion":"2"}}}
As for the optimization, I could not find the place where it is actually doing the optimization so perhaps it's not implemented (i.e. you can define it for a table but the actual implementation is subject to writer's implementation). Also, if it was implemented I would expect there to be some min number of small files before the auto compact is triggered. For example, Databricks has spark.databricks.delta.autoCompact.minNumFiles property to control that (there is no such property in DeltaConfigKey in delta-rs).
This is fixed now with recent change to config handling