delta-rs icon indicating copy to clipboard operation
delta-rs copied to clipboard

DeltaConfigKey values not being used (Rust)

Open nholt01 opened this issue 2 years ago • 2 comments

Environment

Delta-rs version: 0.16.2

Binding: Rust deltalake crate

Environment: Windows, Windows 11


Bug

What happened: I have this function in my code to create a new delta table:

async fn create_initialized_table(table_path: &str) -> DeltaTable {
    DeltaOps::try_from_uri(table_path)
        .await
        .unwrap()
        .create()
        .with_columns(RecordStruct::columns())
        .with_partition_columns(["id_value".to_string()])
        .with_configuration_property(DeltaConfigKey::AutoOptimizeAutoCompact, Some("true"))
        .with_configuration_property(DeltaConfigKey::AutoOptimizeOptimizeWrite, Some("true"))
        .with_configuration_property(DeltaConfigKey::MinWriterVersion, Some("2"))
        .await
        .unwrap()
}

What you expected to happen: In the initial delta log file after creation of the delta table, it should specify that the "minWriterVersion" is 2. But instead it actually shows "1". Notably, it also doesn't seem to compact/optimize writes, as if I make sequential writes to the table on the same partition, they show up as many small parquet files instead of consolidating into fewer large parquet files (which is faster to read). I may be misunderstanding how to perform the optimization/repartitioning to condense many small parquet files into a single larger file. But if not, it seems like none of these configuration properties are being propagated?

How to reproduce it: Run the code as above with:

let table_uri = "delta_table_path";
let mut table = create_initialized_table(&table_uri).await;

nholt01 avatar Oct 22 '23 17:10 nholt01

I'll look into this

r3stl355 avatar Nov 05 '23 10:11 r3stl355

@nholt01 I was unable to reproduce the issue on MacOS (I don't have access to Windows machine), maybe someone else can try. I get this in the table metadata using your code (with the small difference - I used my own table schema) "configuration":{"delta.autoOptimize.optimizeWrite":"true","delta.autoOptimize.autoCompact":"true","delta.minWriterVersion":"2"}}}

As for the optimization, I could not find the place where it is actually doing the optimization so perhaps it's not implemented (i.e. you can define it for a table but the actual implementation is subject to writer's implementation). Also, if it was implemented I would expect there to be some min number of small files before the auto compact is triggered. For example, Databricks has spark.databricks.delta.autoCompact.minNumFiles property to control that (there is no such property in DeltaConfigKey in delta-rs).

r3stl355 avatar Nov 05 '23 16:11 r3stl355

This is fixed now with recent change to config handling

ion-elgreco avatar May 29 '24 13:05 ion-elgreco