datafusion icon indicating copy to clipboard operation
datafusion copied to clipboard

Increase fuzz testing of streaming group by / low cardinality columns

Open alamb opened this issue 1 year ago • 0 comments

Draft as it builds on https://github.com/apache/datafusion/pull/12847

Which issue does this PR close?

Part of #12114

Rationale for this change

As @Rachelint points out https://github.com/apache/datafusion/pull/12847#discussion_r1803436799 the current coverage of sorting columns is limited to sorting on columns of distinct values which limits the number of sequential values in the data

What changes are included in this PR?

Adds two low cardinality columns and ensures we have at least one query in each fuzz run that is entirely ordered by the sort key.

Here is a example of the kind of queries that are tested:

 Testing with query SELECT min(i64) as col1, min(u16) as col2, min(i32) as col3, min(i64) as col4 FROM fuzz_table GROUP BY utf8_low
 Testing with query SELECT min(u32) as col1, min(i32) as col2, min(u16) as col3, min(largeutf8) as col4 FROM fuzz_table GROUP BY u64, i64, utf8_low, u8, u32, utf8
 Testing with query SELECT min(i64) as col1, min(u16) as col2, min(i32) as col3, min(i64) as col4 FROM fuzz_table GROUP BY utf8_low
 Testing with query SELECT min(i32) as col1, min(i64) as col2, min(i16) as col3 FROM fuzz_table GROUP BY i32, u8, largeutf8, utf8, i8, utf8_low, u16
 Testing with query SELECT min(u32) as col1, min(i32) as col2, min(u16) as col3, min(largeutf8) as col4 FROM fuzz_table GROUP BY u64, i64, utf8_low, u8, u32, utf8
 Testing with query SELECT min(u32) as col1, min(i32) as col2, min(u16) as col3, min(largeutf8) as col4 FROM fuzz_table GROUP BY u64, i64, utf8_low, u8, u32, utf8
 Testing with query SELECT min(i16) as col1, min(u8) as col2 FROM fuzz_table GROUP BY utf8_low, u8_low
 Testing with query SELECT min(i32) as col1, min(i64) as col2, min(i16) as col3 FROM fuzz_table GROUP BY i32, u8, largeutf8, utf8, i8, utf8_low, u16
 Testing with query SELECT min(i64) as col1, min(u16) as col2, min(i32) as col3, min(i64) as col4 FROM fuzz_table GROUP BY utf8_low
 Testing with query SELECT min(i64) as col1, min(u16) as col2, min(i32) as col3, min(i64) as col4 FROM fuzz_table GROUP BY utf8_low
 Testing with query SELECT min(i64) as col1, min(u16) as col2, min(i32) as col3, min(i64) as col4 FROM fuzz_table GROUP BY utf8_low
 Testing with query SELECT min(u8_low) as col1, min(i8) as col2, min(i16) as col3, min(u32) as col4 FROM fuzz_table GROUP BY u8_low
 Testing with query SELECT min(i16) as col1, min(u8) as col2 FROM fuzz_table GROUP BY utf8_low, u8_low
 Testing with query SELECT min(u8) as col1, min(utf8) as col2, min(i16) as col3 FROM fuzz_table GROUP BY u8_low
 Testing with query SELECT min(i16) as col1, min(u8) as col2 FROM fuzz_table GROUP BY utf8_low, u8_low
 Testing with query SELECT min(i64) as col1, min(utf8) as col2, min(utf8) as col3 FROM fuzz_table GROUP BY u8_low
 Testing with query SELECT min(u16) as col1, min(u16) as col2, min(i64) as col3 FROM fuzz_table
 Testing with query SELECT min(i32) as col1, min(i64) as col2, min(i16) as col3 FROM fuzz_table GROUP BY i32, u8, largeutf8, utf8, i8, utf8_low, u16
 Testing with query SELECT min(u8) as col1, min(utf8) as col2, min(i16) as col3 FROM fuzz_table GROUP BY u8_low
 Testing with query SELECT min(i64) as col1, min(utf8) as col2, min(utf8) as col3 FROM fuzz_table GROUP BY u8_low
 Testing with query SELECT min(i64) as col1, min(utf8) as col2, min(utf8) as col3 FROM fuzz_table GROUP BY u8_low
 Testing with query SELECT min(i8) as col1 FROM fuzz_table GROUP BY u8_low
 Testing with query SELECT min(i64) as col1, min(utf8) as col2, min(utf8) as col3 FROM fuzz_table GROUP BY u8_low
 Testing with query SELECT min(u32) as col1, min(i32) as col2, min(u16) as col3, min(largeutf8) as col4 FROM fuzz_table GROUP BY u64, i64, utf8_low, u8, u32, utf8
 Testing with query SELECT min(i64) as col1, min(utf8) as col2, min(utf8) as col3 FROM fuzz_table GROUP BY u8_low
 Testing with query SELECT min(u8_low) as col1, min(i8) as col2, min(i16) as col3, min(u32) as col4 FROM fuzz_table GROUP BY u8_low
 Testing with query SELECT min(i8) as col1 FROM fuzz_table GROUP BY u8_low
 Testing with query SELECT min(u8_low) as col1, min(i32) as col2, min(i8) as col3 FROM fuzz_table GROUP BY utf8_low, i64, largeutf8, u64, utf8, i32, i16
 Testing with query SELECT min(i32) as col1, min(i64) as col2, min(i16) as col3 FROM fuzz_table GROUP BY i32, u8, largeutf8, utf8, i8, utf8_low, u16
 Testing with query SELECT min(i16) as col1, min(u8) as col2 FROM fuzz_table GROUP BY utf8_low, u8_low
 Testing with query SELECT min(u32) as col1, min(i32) as col2, min(u16) as col3, min(largeutf8) as col4 FROM fuzz_table GROUP BY u64, i64, utf8_low, u8, u32, utf8
 Testing with query SELECT min(u8_low) as col1, min(i8) as col2, min(i16) as col3, min(u32) as col4 FROM fuzz_table GROUP BY u8_low
 Testing with query SELECT min(u8_low) as col1, min(i8) as col2, min(i16) as col3, min(u32) as col4 FROM fuzz_table GROUP BY u8_low
 Testing with query SELECT min(i64) as col1, min(u16) as col2, min(i32) as col3, min(i64) as col4 FROM fuzz_table GROUP BY utf8_low
 Testing with query SELECT min(i8) as col1 FROM fuzz_table GROUP BY u8_low
 Testing with query SELECT min(i16) as col1, min(u8) as col2 FROM fuzz_table GROUP BY utf8_low, u8_low
 Testing with query SELECT min(i8) as col1 FROM fuzz_table GROUP BY u8_low
 Testing with query SELECT min(i64) as col1, min(utf8) as col2, min(utf8) as col3 FROM fuzz_table GROUP BY u8_low
 Testing with query SELECT min(u32) as col1, min(i32) as col2, min(u16) as col3, min(largeutf8) as col4 FROM fuzz_table GROUP BY u64, i64, utf8_low, u8, u32, utf8
 Testing with query SELECT min(u8_low) as col1, min(i32) as col2, min(i8) as col3 FROM fuzz_table GROUP BY utf8_low, i64, largeutf8, u64, utf8, i32, i16
 Testing with query SELECT min(u8_low) as col1, min(i8) as col2, min(i16) as col3, min(u32) as col4 FROM fuzz_table GROUP BY u8_low
 Testing with query SELECT min(i64) as col1, min(u16) as col2, min(i32) as col3, min(i64) as col4 FROM fuzz_table GROUP BY utf8_low
 Testing with query SELECT min(u8) as col1, min(utf8) as col2, min(i16) as col3 FROM fuzz_table GROUP BY u8_low
 Testing with query SELECT min(i8) as col1 FROM fuzz_table GROUP BY u8_low
 Testing with query SELECT min(u16) as col1, min(u16) as col2, min(i64) as col3 FROM fuzz_table
 Testing with query SELECT min(u32) as col1, min(i32) as col2, min(u16) as col3, min(largeutf8) as col4 FROM fuzz_table GROUP BY u64, i64, utf8_low, u8, u32, utf8
 Testing with query SELECT min(u16) as col1, min(u16) as col2, min(i64) as col3 FROM fuzz_table
 Testing with query SELECT min(u8_low) as col1, min(i32) as col2, min(i8) as col3 FROM fuzz_table GROUP BY utf8_low, i64, largeutf8, u64, utf8, i32, i16

Are these changes tested?

Only tests

Are there any user-facing changes?

alamb avatar Oct 17 '24 17:10 alamb