Auto-update mechanism for dataframe test
Is your feature request related to a problem or challenge?
While working on #10364, I found that changing the result in the rust test is quite painful.
Currently, we need to fix the string manually one by one
It would be nice if there is an easy way to update the test.
In sqllogictest, we can easily done it with --complete flag.
Describe the solution you'd like
Given the test, having a very easy way to auto-update result string
#[tokio::test]
async fn test_fn_upper() -> Result<()> {
let expr = upper(col("a"));
let expected = [
"+---------------+",
"| upper(test.a) |",
"+---------------+",
"| ABCDEF |",
"| ABC123 |",
"| CBADEF |",
"| 123ABCDEF |",
"+---------------+",
];
assert_fn_batches!(expr, expected);
Ok(())
}
Approach 1
One possible solution is writing the result to the file, and comparing it with similar to sqllogictest, but since we need to call expr API, the API calls remain in the rust test, and only the output goes to output file.
Approach 2
Based on https://github.com/apache/datafusion/issues/8736
We can switch between SQL string and Expr and compare the result like sqllogictest does
run_query may be like
let ctx = SessionContext::new();
// one csv table per test file with the same name
// so tests/data/example.csv is the table for tests in tests/data/example.slt
let df = ctx.read_csv("tests/data/example.csv", CsvReadOptions::new()).await?;
// sql to expr
let expr = ascii(col("a"));
let df = df.select(vec![expr])?.collect().await?;
// check the values like sqllogictest
assert_eq!(df, "expected string");
Describe alternatives you've considered
No response
Additional context
No response
Something we have used to great effect in influxdb is https://insta.rs/
You can then do the equivalent of sqllogictest --complete (even for results within files) with a command like
cargo insta review
Some downsides are that it is is yet another dependency (and to use it you need to install cargo install cargo-insta
I think this might be another approach, and we use it in databend. https://crates.io/crates/goldenfile
UPDATE_GOLDENFILES=1 cargo test
If we decide to use a specific approach, I think I can try to take this issue.
I think this might be another approach, and we use it in databend. https://crates.io/crates/goldenfile
UPDATE_GOLDENFILES=1 cargo testIf we decide to use a specific approach, I think I can try to take this issue.
We could give it a try! As long the test is easy to maintain, I will take it.
The goal of this issue is to fix #10364 .
I stalled on the progress because #10364 is not a high priority issue (we can easily add alias to avoid the issue + it is probably solved in #11020), and since datafusion/example is not test, so I can not easily update it with cargo insta. Therefore, I just leave it there for now. I hope goldenfile could make datafusion/example easy to update too!
I think goldenfile can fulfill this requirement. I'll create some examples and submit a PR for evaluation. If everything works well, we can proceed with migrating the dataframe tests and examples. Please assign me this issue. cc @alamb @jayzhan211
Thanks @PsiACE and @jayzhan211
@PsiACE It would also be interesting that how much the size increase after the change #11105