paimon icon indicating copy to clipboard operation
paimon copied to clipboard

[Flink] Disable statistics for source if the predications are all value filters

Open schnappi17 opened this issue 2 years ago • 3 comments

Purpose

Linked issue: close #2185

Tests

DataTableSourceTest#testCheckAllValuePredication FileStoreTableStatisticsTestBase#testTableFilterValueDisableStatistics

Below is the tpch test result: Env: JM 64 core; TM 16 core,8 slot per tm; Paimon 0.5 x Flink 1.17 Table options: No partitions; The left is under configuration: source.value-filter-statistics.disable = true; The right is under configuration: source.value-filter-statistics.disable = false

query / lantency (ms) source.value-filter-statistics.disable = true source.value-filter-statistics.disable = false
1 7300 14745
2 13768 14179
3 9323 9490
4 5957 6091
5 13720 133810
6 3849 4664
7 25553 31348
8 22970 23417
9 26187 26653
10 10630 33056
11 9491 9062
12 7741 40244
13 8082 8291
14 5829 9153
15 6515 6922
16 8650 8515
17 13995 38186
18 15374 17667
19 7780 3305
20 11588 11195
21 14273 15423
22 7920 8090

As we can see, most of the queries are more efficient with source.value-filter-statistics.disable = true than the opposite, especially the query 5, with the help of statistics disabled, the query can be run into 2 mins and never timeout.

schnappi17 avatar Oct 30 '23 03:10 schnappi17

@JingsongLi We add some benchmark for this issue, please help to review it when you're free, thanks

FangYongs avatar Dec 26 '23 10:12 FangYongs

Thanks @schnappi17 and @FangYongs for the benchmark, can you also test Flink config table.optimizer.source.report-statistics-enabled to false?

JingsongLi avatar Jan 03 '24 02:01 JingsongLi

Thanks @schnappi17 and @FangYongs for the benchmark, can you also test Flink config table.optimizer.source.report-statistics-enabled to false?

Sure, we can do more test on this.

schnappi17 avatar Jan 03 '24 03:01 schnappi17