datafusion
datafusion copied to clipboard
Increase fuzz testing of streaming group by / low cardinality columns
Draft as it builds on https://github.com/apache/datafusion/pull/12847
Which issue does this PR close?
Part of #12114
Rationale for this change
As @Rachelint points out https://github.com/apache/datafusion/pull/12847#discussion_r1803436799 the current coverage of sorting columns is limited to sorting on columns of distinct values which limits the number of sequential values in the data
What changes are included in this PR?
Adds two low cardinality columns and ensures we have at least one query in each fuzz run that is entirely ordered by the sort key.
Here is a example of the kind of queries that are tested:
Testing with query SELECT min(i64) as col1, min(u16) as col2, min(i32) as col3, min(i64) as col4 FROM fuzz_table GROUP BY utf8_low
Testing with query SELECT min(u32) as col1, min(i32) as col2, min(u16) as col3, min(largeutf8) as col4 FROM fuzz_table GROUP BY u64, i64, utf8_low, u8, u32, utf8
Testing with query SELECT min(i64) as col1, min(u16) as col2, min(i32) as col3, min(i64) as col4 FROM fuzz_table GROUP BY utf8_low
Testing with query SELECT min(i32) as col1, min(i64) as col2, min(i16) as col3 FROM fuzz_table GROUP BY i32, u8, largeutf8, utf8, i8, utf8_low, u16
Testing with query SELECT min(u32) as col1, min(i32) as col2, min(u16) as col3, min(largeutf8) as col4 FROM fuzz_table GROUP BY u64, i64, utf8_low, u8, u32, utf8
Testing with query SELECT min(u32) as col1, min(i32) as col2, min(u16) as col3, min(largeutf8) as col4 FROM fuzz_table GROUP BY u64, i64, utf8_low, u8, u32, utf8
Testing with query SELECT min(i16) as col1, min(u8) as col2 FROM fuzz_table GROUP BY utf8_low, u8_low
Testing with query SELECT min(i32) as col1, min(i64) as col2, min(i16) as col3 FROM fuzz_table GROUP BY i32, u8, largeutf8, utf8, i8, utf8_low, u16
Testing with query SELECT min(i64) as col1, min(u16) as col2, min(i32) as col3, min(i64) as col4 FROM fuzz_table GROUP BY utf8_low
Testing with query SELECT min(i64) as col1, min(u16) as col2, min(i32) as col3, min(i64) as col4 FROM fuzz_table GROUP BY utf8_low
Testing with query SELECT min(i64) as col1, min(u16) as col2, min(i32) as col3, min(i64) as col4 FROM fuzz_table GROUP BY utf8_low
Testing with query SELECT min(u8_low) as col1, min(i8) as col2, min(i16) as col3, min(u32) as col4 FROM fuzz_table GROUP BY u8_low
Testing with query SELECT min(i16) as col1, min(u8) as col2 FROM fuzz_table GROUP BY utf8_low, u8_low
Testing with query SELECT min(u8) as col1, min(utf8) as col2, min(i16) as col3 FROM fuzz_table GROUP BY u8_low
Testing with query SELECT min(i16) as col1, min(u8) as col2 FROM fuzz_table GROUP BY utf8_low, u8_low
Testing with query SELECT min(i64) as col1, min(utf8) as col2, min(utf8) as col3 FROM fuzz_table GROUP BY u8_low
Testing with query SELECT min(u16) as col1, min(u16) as col2, min(i64) as col3 FROM fuzz_table
Testing with query SELECT min(i32) as col1, min(i64) as col2, min(i16) as col3 FROM fuzz_table GROUP BY i32, u8, largeutf8, utf8, i8, utf8_low, u16
Testing with query SELECT min(u8) as col1, min(utf8) as col2, min(i16) as col3 FROM fuzz_table GROUP BY u8_low
Testing with query SELECT min(i64) as col1, min(utf8) as col2, min(utf8) as col3 FROM fuzz_table GROUP BY u8_low
Testing with query SELECT min(i64) as col1, min(utf8) as col2, min(utf8) as col3 FROM fuzz_table GROUP BY u8_low
Testing with query SELECT min(i8) as col1 FROM fuzz_table GROUP BY u8_low
Testing with query SELECT min(i64) as col1, min(utf8) as col2, min(utf8) as col3 FROM fuzz_table GROUP BY u8_low
Testing with query SELECT min(u32) as col1, min(i32) as col2, min(u16) as col3, min(largeutf8) as col4 FROM fuzz_table GROUP BY u64, i64, utf8_low, u8, u32, utf8
Testing with query SELECT min(i64) as col1, min(utf8) as col2, min(utf8) as col3 FROM fuzz_table GROUP BY u8_low
Testing with query SELECT min(u8_low) as col1, min(i8) as col2, min(i16) as col3, min(u32) as col4 FROM fuzz_table GROUP BY u8_low
Testing with query SELECT min(i8) as col1 FROM fuzz_table GROUP BY u8_low
Testing with query SELECT min(u8_low) as col1, min(i32) as col2, min(i8) as col3 FROM fuzz_table GROUP BY utf8_low, i64, largeutf8, u64, utf8, i32, i16
Testing with query SELECT min(i32) as col1, min(i64) as col2, min(i16) as col3 FROM fuzz_table GROUP BY i32, u8, largeutf8, utf8, i8, utf8_low, u16
Testing with query SELECT min(i16) as col1, min(u8) as col2 FROM fuzz_table GROUP BY utf8_low, u8_low
Testing with query SELECT min(u32) as col1, min(i32) as col2, min(u16) as col3, min(largeutf8) as col4 FROM fuzz_table GROUP BY u64, i64, utf8_low, u8, u32, utf8
Testing with query SELECT min(u8_low) as col1, min(i8) as col2, min(i16) as col3, min(u32) as col4 FROM fuzz_table GROUP BY u8_low
Testing with query SELECT min(u8_low) as col1, min(i8) as col2, min(i16) as col3, min(u32) as col4 FROM fuzz_table GROUP BY u8_low
Testing with query SELECT min(i64) as col1, min(u16) as col2, min(i32) as col3, min(i64) as col4 FROM fuzz_table GROUP BY utf8_low
Testing with query SELECT min(i8) as col1 FROM fuzz_table GROUP BY u8_low
Testing with query SELECT min(i16) as col1, min(u8) as col2 FROM fuzz_table GROUP BY utf8_low, u8_low
Testing with query SELECT min(i8) as col1 FROM fuzz_table GROUP BY u8_low
Testing with query SELECT min(i64) as col1, min(utf8) as col2, min(utf8) as col3 FROM fuzz_table GROUP BY u8_low
Testing with query SELECT min(u32) as col1, min(i32) as col2, min(u16) as col3, min(largeutf8) as col4 FROM fuzz_table GROUP BY u64, i64, utf8_low, u8, u32, utf8
Testing with query SELECT min(u8_low) as col1, min(i32) as col2, min(i8) as col3 FROM fuzz_table GROUP BY utf8_low, i64, largeutf8, u64, utf8, i32, i16
Testing with query SELECT min(u8_low) as col1, min(i8) as col2, min(i16) as col3, min(u32) as col4 FROM fuzz_table GROUP BY u8_low
Testing with query SELECT min(i64) as col1, min(u16) as col2, min(i32) as col3, min(i64) as col4 FROM fuzz_table GROUP BY utf8_low
Testing with query SELECT min(u8) as col1, min(utf8) as col2, min(i16) as col3 FROM fuzz_table GROUP BY u8_low
Testing with query SELECT min(i8) as col1 FROM fuzz_table GROUP BY u8_low
Testing with query SELECT min(u16) as col1, min(u16) as col2, min(i64) as col3 FROM fuzz_table
Testing with query SELECT min(u32) as col1, min(i32) as col2, min(u16) as col3, min(largeutf8) as col4 FROM fuzz_table GROUP BY u64, i64, utf8_low, u8, u32, utf8
Testing with query SELECT min(u16) as col1, min(u16) as col2, min(i64) as col3 FROM fuzz_table
Testing with query SELECT min(u8_low) as col1, min(i32) as col2, min(i8) as col3 FROM fuzz_table GROUP BY utf8_low, i64, largeutf8, u64, utf8, i32, i16
Are these changes tested?
Only tests