datafusion icon indicating copy to clipboard operation
datafusion copied to clipboard

INSERT INTO SQL failing on CSV-backed table

Open singularsyntax opened this issue 1 year ago • 3 comments

Describe the bug

Hello,

When I try to insert data with the INSERT INTO SQL syntax (see reproduction code below), I get the error: Inserting query must have the same schema with the table.

[2024-05-01T00:48:23Z INFO] TABLE SCHEMA: DFSchema { fields: [DFField { qualifier: Some(Bare { table: "test" }), field: Field { name: "k", data_type: Utf8, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: {} } }, DFField { qualifier: Some(Bare { table: "test" }), field: Field { name: "v", data_type: Utf8, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: {} } }], metadata: {}, functional_dependencies: FunctionalDependencies { deps: [FunctionalDependence { source_indices: [0], target_indices: [0, 1], nullable: false, mode: Single }] } }
[2024-05-01T00:48:23Z INFO] DATAFRAME SCHEMA: DFSchema { fields: [DFField { qualifier: None, field: Field { name: "k", data_type: Utf8, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: {} } }, DFField { qualifier: None, field: Field { name: "v", data_type: Utf8, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: {} } }], metadata: {}, functional_dependencies: FunctionalDependencies { deps: [] } }
thread 'main' panicked at src/main.rs:317:88:
called `Result::unwrap()` on an `Err` value: Plan("Inserting query must have the same schema with the table.")

As logged above, the problem seems to be in the discrepancy between the table schema, which is qualified with the table name, and the query schema, which is not.

The code I'm using is about as simple as I can imagine. Am I missing something? Is there some example code that demonstrates how to use INSERT INTO SQL correctly? Or is this a bug?

To Reproduce

async fn df_test() {
    let ctx = SessionContext::new();
    let sql = "CREATE EXTERNAL TABLE test (k VARCHAR PRIMARY KEY NOT NULL, v VARCHAR NOT NULL) STORED AS CSV LOCATION './store/test/'";
    let df = ctx.sql(sql).await.unwrap();

    df.collect().await.unwrap();

    let table_df = ctx.table("test").await.unwrap();
    info!("TABLE SCHEMA: {:?}", table_df.schema());
 
    let sql = "INSERT INTO test (k, v) VALUES ('foo', 'bar')";
    let query_df = ctx.sql(sql).await.unwrap();
    info!("DATAFRAME SCHEMA: {:?}", query_df.schema());

    let _result = query_df.write_table("test", DataFrameWriteOptions::default()).await.unwrap();
}

Expected behavior

Insertion of the row ('foo', 'bar') is successful. DataFusion creates a CSV file in the filesystem corresponding to the inserted data.

Additional context

[dependencies]
datafusion = "37.1.0"

singularsyntax avatar May 01 '24 00:05 singularsyntax

Additional information:

If I replace the call to write_table() with write_csv():

let _result = query_df.write_csv("foo", DataFrameWriteOptions::default(), None).await.unwrap();

I get the following error:

thread 'main' panicked at ~/.cargo/registry/src/index.crates.io-6f17d22bba15001f/datafusion-physical-plan-37.1.0/src/insert.rs:127:9:
assertion `left == right` failed
  left: 2
 right: 1

singularsyntax avatar May 01 '24 01:05 singularsyntax

This looks like a bug. I wonder if this is a regression from #9595?

phillipleblanc avatar May 01 '24 03:05 phillipleblanc

I think it's a latent bug which doesn't relate to #9595 , I tested using version 36 code. I can try to help it to see what's wrong with it. :)

yyy1000 avatar May 03 '24 05:05 yyy1000