embucket-labs icon indicating copy to clipboard operation
embucket-labs copied to clipboard

Concurrent INSERT operations result in data loss despite successful execution

Open rampage644 opened this issue 3 months ago • 0 comments

Problem

Concurrent INSERT operations on the same table complete successfully but only the last operation's data persists to the table, resulting in silent data loss.

Reproduction

async fn test_concurrent_query_execution() {
    let session = create_df_session().await;

    // Create a test table

    let num_operations = 5;

    // Execute multiple parallel INSERT operations
    let mut handles = vec![];
    for i in 0..num_operations {
        let session_clone = session.clone();
        let handle = tokio::spawn(async move {
            let insert_query = format!("INSERT INTO concurrent_test VALUES ({})", i + 1);
            let mut query = session_clone.query(&insert_query, QueryContext::default());
            let result = query.execute().await;
            result
        });
        handles.push(handle);
    }
    // SELECT COUNT(*) FROM concurrent_test 
  • Metastore's update_table seem to fail persisting concurrent changes, but is easily updated with locks ( per-key locking to SlateDBMetastore::update_table()) that fixes the behaviour.
  • The issue might be on a different level and requires metastore interface change (select_for_update?)
  • Silent data loss

rampage644 avatar Sep 16 '25 21:09 rampage644