datafusion icon indicating copy to clipboard operation
datafusion copied to clipboard

Auto-update mechanism for dataframe test

Open jayzhan211 opened this issue 1 year ago • 6 comments

Is your feature request related to a problem or challenge?

While working on #10364, I found that changing the result in the rust test is quite painful.

Currently, we need to fix the string manually one by one

It would be nice if there is an easy way to update the test.

In sqllogictest, we can easily done it with --complete flag.

Describe the solution you'd like

Given the test, having a very easy way to auto-update result string

#[tokio::test]
async fn test_fn_upper() -> Result<()> {
    let expr = upper(col("a"));

    let expected = [
        "+---------------+",
        "| upper(test.a) |",
        "+---------------+",
        "| ABCDEF        |",
        "| ABC123        |",
        "| CBADEF        |",
        "| 123ABCDEF     |",
        "+---------------+",
    ];
    assert_fn_batches!(expr, expected);

    Ok(())
}

Approach 1

One possible solution is writing the result to the file, and comparing it with similar to sqllogictest, but since we need to call expr API, the API calls remain in the rust test, and only the output goes to output file.

Approach 2

Based on https://github.com/apache/datafusion/issues/8736 We can switch between SQL string and Expr and compare the result like sqllogictest does

run_query may be like

    let ctx = SessionContext::new();
    // one csv table per test file with the same name
    // so tests/data/example.csv is the table for tests in tests/data/example.slt
    let df = ctx.read_csv("tests/data/example.csv", CsvReadOptions::new()).await?;
    // sql to expr
    let expr = ascii(col("a"));
    let df = df.select(vec![expr])?.collect().await?;
    // check the values like sqllogictest
    assert_eq!(df, "expected string");

Describe alternatives you've considered

No response

Additional context

No response

jayzhan211 avatar May 04 '24 04:05 jayzhan211

Something we have used to great effect in influxdb is https://insta.rs/

You can then do the equivalent of sqllogictest --complete (even for results within files) with a command like

cargo insta review

Some downsides are that it is is yet another dependency (and to use it you need to install cargo install cargo-insta

alamb avatar May 08 '24 17:05 alamb

I think this might be another approach, and we use it in databend. https://crates.io/crates/goldenfile

UPDATE_GOLDENFILES=1 cargo test

If we decide to use a specific approach, I think I can try to take this issue.

PsiACE avatar Jun 19 '24 15:06 PsiACE

I think this might be another approach, and we use it in databend. https://crates.io/crates/goldenfile

UPDATE_GOLDENFILES=1 cargo test

If we decide to use a specific approach, I think I can try to take this issue.

We could give it a try! As long the test is easy to maintain, I will take it.

The goal of this issue is to fix #10364 .

I stalled on the progress because #10364 is not a high priority issue (we can easily add alias to avoid the issue + it is probably solved in #11020), and since datafusion/example is not test, so I can not easily update it with cargo insta. Therefore, I just leave it there for now. I hope goldenfile could make datafusion/example easy to update too!

jayzhan211 avatar Jun 20 '24 12:06 jayzhan211

I think goldenfile can fulfill this requirement. I'll create some examples and submit a PR for evaluation. If everything works well, we can proceed with migrating the dataframe tests and examples. Please assign me this issue. cc @alamb @jayzhan211

PsiACE avatar Jun 21 '24 11:06 PsiACE

Thanks @PsiACE and @jayzhan211

alamb avatar Jun 21 '24 22:06 alamb

@PsiACE It would also be interesting that how much the size increase after the change #11105

jayzhan211 avatar Jun 25 '24 09:06 jayzhan211