datafusion-comet
datafusion-comet copied to clipboard
Many tests appear to test dictionary-encoded arrays but actually do not
Describe the bug
Many tests use the following pattern to cover testing with dictionary-encoded data.
Seq("true", "false").foreach { dictionary =>
withParquetTable(
(-5 until 5).map(i => (i.toDouble + 0.3, i.toDouble + 0.8)),
"tbl",
withDictionary = dictionary.toBoolean) {
However, in many cases, including this example, the data does not contain repeated values and therefore does not cause any dictionary-encoded arrays to be created.
PR https://github.com/apache/datafusion-comet/pull/752 adds an assertion to detect such cases and updates some tests to stop testing with withDictionary = true, linking to this issue for tracking.
We can close this issue when these tests have been updated to actually test dictionary-encoded inputs.