datafusion-comet
datafusion-comet copied to clipboard
chore: Log warning for tests that are not really testing dictionary-encoded Parquet files
Which issue does this PR close?
N/A
Rationale for this change
Many tests use the following pattern to cover testing with dictionary-encoded data.
Seq("true", "false").foreach { dictionary =>
withParquetTable(
(-5 until 5).map(i => (i.toDouble + 0.3, i.toDouble + 0.8)),
"tbl",
withDictionary = dictionary.toBoolean) {
However, in many cases, including this example, the data does not contain repeated values and therefore does not cause any dictionary-encoded arrays to be created.
What changes are included in this PR?
Log a warning when withParquetTable is called with withDictionary=true and when the parquet file does not actually contain dictionary-encoded data.
I was originally going to add an assertion instead of a log message but it was too disruptive.
Example output from running tests:
- test multiple pages with mixed PLAIN_DICTIONARY and PLAIN encoding (prefetch enabled) (709 milliseconds)
- test multiple pages with mixed PLAIN_DICTIONARY and PLAIN encoding (524 milliseconds)
WARN: withParquetFile was called with withDictionary=true but did not write any dictionary-encoded columns
- skip vector re-loading (prefetch enabled) (669 milliseconds)
WARN: withParquetFile was called with withDictionary=true but did not write any dictionary-encoded columns
How are these changes tested?
No functional change to test. Manually tested that logging appears.