datafusion-comet icon indicating copy to clipboard operation
datafusion-comet copied to clipboard

Many tests appear to test dictionary-encoded arrays but actually do not

Open andygrove opened this issue 1 year ago • 0 comments

Describe the bug

Many tests use the following pattern to cover testing with dictionary-encoded data.

    Seq("true", "false").foreach { dictionary =>
        withParquetTable(
          (-5 until 5).map(i => (i.toDouble + 0.3, i.toDouble + 0.8)),
          "tbl",
          withDictionary = dictionary.toBoolean) {

However, in many cases, including this example, the data does not contain repeated values and therefore does not cause any dictionary-encoded arrays to be created.

PR https://github.com/apache/datafusion-comet/pull/752 adds an assertion to detect such cases and updates some tests to stop testing with withDictionary = true, linking to this issue for tracking.

We can close this issue when these tests have been updated to actually test dictionary-encoded inputs.

andygrove avatar Aug 01 '24 16:08 andygrove