[Flink] Disable statistics for source if the predications are all value filters
Purpose
Linked issue: close #2185
Tests
DataTableSourceTest#testCheckAllValuePredication FileStoreTableStatisticsTestBase#testTableFilterValueDisableStatistics
Below is the tpch test result:
Env: JM 64 core; TM 16 core,8 slot per tm; Paimon 0.5 x Flink 1.17
Table options: No partitions; The left is under configuration: source.value-filter-statistics.disable = true; The right is under configuration: source.value-filter-statistics.disable = false
| query / lantency (ms) | source.value-filter-statistics.disable = true | source.value-filter-statistics.disable = false |
|---|---|---|
| 1 | 7300 | 14745 |
| 2 | 13768 | 14179 |
| 3 | 9323 | 9490 |
| 4 | 5957 | 6091 |
| 5 | 13720 | 133810 |
| 6 | 3849 | 4664 |
| 7 | 25553 | 31348 |
| 8 | 22970 | 23417 |
| 9 | 26187 | 26653 |
| 10 | 10630 | 33056 |
| 11 | 9491 | 9062 |
| 12 | 7741 | 40244 |
| 13 | 8082 | 8291 |
| 14 | 5829 | 9153 |
| 15 | 6515 | 6922 |
| 16 | 8650 | 8515 |
| 17 | 13995 | 38186 |
| 18 | 15374 | 17667 |
| 19 | 7780 | 3305 |
| 20 | 11588 | 11195 |
| 21 | 14273 | 15423 |
| 22 | 7920 | 8090 |
As we can see, most of the queries are more efficient with source.value-filter-statistics.disable = true than the opposite, especially the query 5, with the help of statistics disabled, the query can be run into 2 mins and never timeout.
@JingsongLi We add some benchmark for this issue, please help to review it when you're free, thanks
Thanks @schnappi17 and @FangYongs for the benchmark, can you also test Flink config table.optimizer.source.report-statistics-enabled to false?
Thanks @schnappi17 and @FangYongs for the benchmark, can you also test Flink config
table.optimizer.source.report-statistics-enabledto false?
Sure, we can do more test on this.