datafusion icon indicating copy to clipboard operation
datafusion copied to clipboard

Prune files during streams and avoid additional pruning if there are no dynamic filters

Open adriangb opened this issue 6 months ago โ€ข 18 comments

https://github.com/apache/datafusion/pull/16014#issuecomment-2977125894

adriangb avatar Jun 16 '25 16:06 adriangb

cc @alamb I think this resolves the concern about perf overhead of this late pruning when there are no dynamic filters; it's a tossup of what happens when there are dynamic filters, in the case of a topk with large files it's clearly a win, but there could obviously be cases where the additional checks are more overhead if they don't result in early termination of the streams

adriangb avatar Jun 16 '25 16:06 adriangb

๐Ÿค– ./gh_compare_branch.sh Benchmark Script Running Linux aal-dev 6.11.0-1015-gcp #15~24.04.1-Ubuntu SMP Thu Apr 24 20:41:05 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux Comparing prune-rg (936e039c84190e7345a5b4cff25d5e043c7b18d6) to dd936cb1b25cb685e0e146f297950eb00048c64c diff Benchmarks: tpch_mem clickbench_partitioned clickbench_extended Results will be posted here when complete

alamb avatar Jun 16 '25 21:06 alamb

Do we expect the benchmarks to show anything? I don't think they're using dynamic filters right? Maybe we need to merge https://github.com/apache/datafusion/pull/15770 and then we can benchmark this?

adriangb avatar Jun 16 '25 21:06 adriangb

๐Ÿค–: Benchmark completed

Details

Comparing HEAD and prune-rg
--------------------
Benchmark clickbench_extended.json
--------------------
โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”ณโ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”ณโ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”ณโ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”“
โ”ƒ Query        โ”ƒ        HEAD โ”ƒ    prune-rg โ”ƒ    Change โ”ƒ
โ”กโ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ•‡โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ•‡โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ•‡โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”ฉ
โ”‚ QQuery 0     โ”‚  1879.99 ms โ”‚  1939.08 ms โ”‚ no change โ”‚
โ”‚ QQuery 1     โ”‚   693.65 ms โ”‚   708.16 ms โ”‚ no change โ”‚
โ”‚ QQuery 2     โ”‚  1355.42 ms โ”‚  1393.13 ms โ”‚ no change โ”‚
โ”‚ QQuery 3     โ”‚   669.90 ms โ”‚   672.00 ms โ”‚ no change โ”‚
โ”‚ QQuery 4     โ”‚  1337.47 ms โ”‚  1363.59 ms โ”‚ no change โ”‚
โ”‚ QQuery 5     โ”‚ 15038.90 ms โ”‚ 15112.89 ms โ”‚ no change โ”‚
โ”‚ QQuery 6     โ”‚  1986.81 ms โ”‚  1965.90 ms โ”‚ no change โ”‚
โ”‚ QQuery 7     โ”‚  1929.58 ms โ”‚  1936.37 ms โ”‚ no change โ”‚
โ”‚ QQuery 8     โ”‚   799.31 ms โ”‚   798.86 ms โ”‚ no change โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”ณโ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”“
โ”ƒ Benchmark Summary       โ”ƒ            โ”ƒ
โ”กโ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ•‡โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”ฉ
โ”‚ Total Time (HEAD)       โ”‚ 25691.04ms โ”‚
โ”‚ Total Time (prune-rg)   โ”‚ 25889.98ms โ”‚
โ”‚ Average Time (HEAD)     โ”‚  2854.56ms โ”‚
โ”‚ Average Time (prune-rg) โ”‚  2876.66ms โ”‚
โ”‚ Queries Faster          โ”‚          0 โ”‚
โ”‚ Queries Slower          โ”‚          0 โ”‚
โ”‚ Queries with No Change  โ”‚          9 โ”‚
โ”‚ Queries with Failure    โ”‚          0 โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
--------------------
Benchmark clickbench_partitioned.json
--------------------
โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”ณโ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”ณโ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”ณโ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”“
โ”ƒ Query        โ”ƒ        HEAD โ”ƒ    prune-rg โ”ƒ        Change โ”ƒ
โ”กโ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ•‡โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ•‡โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ•‡โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”ฉ
โ”‚ QQuery 0     โ”‚    16.59 ms โ”‚    15.61 ms โ”‚ +1.06x faster โ”‚
โ”‚ QQuery 1     โ”‚    33.33 ms โ”‚    32.71 ms โ”‚     no change โ”‚
โ”‚ QQuery 2     โ”‚    80.97 ms โ”‚    79.03 ms โ”‚     no change โ”‚
โ”‚ QQuery 3     โ”‚    98.84 ms โ”‚   101.19 ms โ”‚     no change โ”‚
โ”‚ QQuery 4     โ”‚   589.88 ms โ”‚   617.40 ms โ”‚     no change โ”‚
โ”‚ QQuery 5     โ”‚   822.41 ms โ”‚   848.28 ms โ”‚     no change โ”‚
โ”‚ QQuery 6     โ”‚    23.72 ms โ”‚    23.39 ms โ”‚     no change โ”‚
โ”‚ QQuery 7     โ”‚    36.21 ms โ”‚    35.69 ms โ”‚     no change โ”‚
โ”‚ QQuery 8     โ”‚   857.78 ms โ”‚   867.88 ms โ”‚     no change โ”‚
โ”‚ QQuery 9     โ”‚  1165.86 ms โ”‚  1172.75 ms โ”‚     no change โ”‚
โ”‚ QQuery 10    โ”‚   252.83 ms โ”‚   253.69 ms โ”‚     no change โ”‚
โ”‚ QQuery 11    โ”‚   282.67 ms โ”‚   281.46 ms โ”‚     no change โ”‚
โ”‚ QQuery 12    โ”‚   854.98 ms โ”‚   846.96 ms โ”‚     no change โ”‚
โ”‚ QQuery 13    โ”‚  1257.05 ms โ”‚  1266.75 ms โ”‚     no change โ”‚
โ”‚ QQuery 14    โ”‚   801.95 ms โ”‚   785.82 ms โ”‚     no change โ”‚
โ”‚ QQuery 15    โ”‚   777.20 ms โ”‚   764.66 ms โ”‚     no change โ”‚
โ”‚ QQuery 16    โ”‚  1633.65 ms โ”‚  1595.44 ms โ”‚     no change โ”‚
โ”‚ QQuery 17    โ”‚  1595.53 ms โ”‚  1582.06 ms โ”‚     no change โ”‚
โ”‚ QQuery 18    โ”‚  2896.26 ms โ”‚  2864.05 ms โ”‚     no change โ”‚
โ”‚ QQuery 19    โ”‚    86.57 ms โ”‚    84.57 ms โ”‚     no change โ”‚
โ”‚ QQuery 20    โ”‚  1119.10 ms โ”‚  1094.90 ms โ”‚     no change โ”‚
โ”‚ QQuery 21    โ”‚  1243.84 ms โ”‚  1271.98 ms โ”‚     no change โ”‚
โ”‚ QQuery 22    โ”‚  2064.74 ms โ”‚  2070.37 ms โ”‚     no change โ”‚
โ”‚ QQuery 23    โ”‚  7537.75 ms โ”‚  7537.46 ms โ”‚     no change โ”‚
โ”‚ QQuery 24    โ”‚   446.88 ms โ”‚   445.02 ms โ”‚     no change โ”‚
โ”‚ QQuery 25    โ”‚   367.87 ms โ”‚   374.34 ms โ”‚     no change โ”‚
โ”‚ QQuery 26    โ”‚   503.34 ms โ”‚   506.30 ms โ”‚     no change โ”‚
โ”‚ QQuery 27    โ”‚  1481.94 ms โ”‚  1504.55 ms โ”‚     no change โ”‚
โ”‚ QQuery 28    โ”‚ 11763.16 ms โ”‚ 11902.61 ms โ”‚     no change โ”‚
โ”‚ QQuery 29    โ”‚   525.71 ms โ”‚   532.18 ms โ”‚     no change โ”‚
โ”‚ QQuery 30    โ”‚   752.63 ms โ”‚   753.79 ms โ”‚     no change โ”‚
โ”‚ QQuery 31    โ”‚   801.83 ms โ”‚   796.85 ms โ”‚     no change โ”‚
โ”‚ QQuery 32    โ”‚  2494.56 ms โ”‚  2480.09 ms โ”‚     no change โ”‚
โ”‚ QQuery 33    โ”‚  3143.19 ms โ”‚  3172.92 ms โ”‚     no change โ”‚
โ”‚ QQuery 34    โ”‚  3150.66 ms โ”‚  3179.61 ms โ”‚     no change โ”‚
โ”‚ QQuery 35    โ”‚  1225.93 ms โ”‚  1238.68 ms โ”‚     no change โ”‚
โ”‚ QQuery 36    โ”‚   123.81 ms โ”‚   124.66 ms โ”‚     no change โ”‚
โ”‚ QQuery 37    โ”‚    55.59 ms โ”‚    55.17 ms โ”‚     no change โ”‚
โ”‚ QQuery 38    โ”‚   121.16 ms โ”‚   124.64 ms โ”‚     no change โ”‚
โ”‚ QQuery 39    โ”‚   195.41 ms โ”‚   195.87 ms โ”‚     no change โ”‚
โ”‚ QQuery 40    โ”‚    46.69 ms โ”‚    48.51 ms โ”‚     no change โ”‚
โ”‚ QQuery 41    โ”‚    44.18 ms โ”‚    43.08 ms โ”‚     no change โ”‚
โ”‚ QQuery 42    โ”‚    39.09 ms โ”‚    38.68 ms โ”‚     no change โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”ณโ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”“
โ”ƒ Benchmark Summary       โ”ƒ            โ”ƒ
โ”กโ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ•‡โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”ฉ
โ”‚ Total Time (HEAD)       โ”‚ 53413.35ms โ”‚
โ”‚ Total Time (prune-rg)   โ”‚ 53611.67ms โ”‚
โ”‚ Average Time (HEAD)     โ”‚  1242.17ms โ”‚
โ”‚ Average Time (prune-rg) โ”‚  1246.78ms โ”‚
โ”‚ Queries Faster          โ”‚          1 โ”‚
โ”‚ Queries Slower          โ”‚          0 โ”‚
โ”‚ Queries with No Change  โ”‚         42 โ”‚
โ”‚ Queries with Failure    โ”‚          0 โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
--------------------
Benchmark tpch_mem_sf1.json
--------------------
โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”ณโ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”ณโ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”ณโ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”“
โ”ƒ Query        โ”ƒ      HEAD โ”ƒ  prune-rg โ”ƒ       Change โ”ƒ
โ”กโ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ•‡โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ•‡โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ•‡โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”ฉ
โ”‚ QQuery 1     โ”‚ 100.52 ms โ”‚ 100.26 ms โ”‚    no change โ”‚
โ”‚ QQuery 2     โ”‚  21.21 ms โ”‚  21.35 ms โ”‚    no change โ”‚
โ”‚ QQuery 3     โ”‚  32.52 ms โ”‚  32.87 ms โ”‚    no change โ”‚
โ”‚ QQuery 4     โ”‚  19.08 ms โ”‚  18.72 ms โ”‚    no change โ”‚
โ”‚ QQuery 5     โ”‚  51.07 ms โ”‚  50.15 ms โ”‚    no change โ”‚
โ”‚ QQuery 6     โ”‚  11.90 ms โ”‚  12.19 ms โ”‚    no change โ”‚
โ”‚ QQuery 7     โ”‚  85.38 ms โ”‚  89.71 ms โ”‚ 1.05x slower โ”‚
โ”‚ QQuery 8     โ”‚  24.32 ms โ”‚  25.05 ms โ”‚    no change โ”‚
โ”‚ QQuery 9     โ”‚  53.59 ms โ”‚  54.12 ms โ”‚    no change โ”‚
โ”‚ QQuery 10    โ”‚  43.80 ms โ”‚  43.31 ms โ”‚    no change โ”‚
โ”‚ QQuery 11    โ”‚  11.57 ms โ”‚  11.31 ms โ”‚    no change โ”‚
โ”‚ QQuery 12    โ”‚  35.33 ms โ”‚  34.53 ms โ”‚    no change โ”‚
โ”‚ QQuery 13    โ”‚  25.59 ms โ”‚  26.29 ms โ”‚    no change โ”‚
โ”‚ QQuery 14    โ”‚   9.80 ms โ”‚   9.68 ms โ”‚    no change โ”‚
โ”‚ QQuery 15    โ”‚  18.63 ms โ”‚  19.74 ms โ”‚ 1.06x slower โ”‚
โ”‚ QQuery 16    โ”‚  19.16 ms โ”‚  18.61 ms โ”‚    no change โ”‚
โ”‚ QQuery 17    โ”‚  97.35 ms โ”‚  96.99 ms โ”‚    no change โ”‚
โ”‚ QQuery 18    โ”‚ 205.89 ms โ”‚ 200.40 ms โ”‚    no change โ”‚
โ”‚ QQuery 19    โ”‚  27.10 ms โ”‚  26.81 ms โ”‚    no change โ”‚
โ”‚ QQuery 20    โ”‚  32.14 ms โ”‚  32.06 ms โ”‚    no change โ”‚
โ”‚ QQuery 21    โ”‚ 152.12 ms โ”‚ 148.42 ms โ”‚    no change โ”‚
โ”‚ QQuery 22    โ”‚  15.22 ms โ”‚  15.37 ms โ”‚    no change โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”ณโ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”“
โ”ƒ Benchmark Summary       โ”ƒ           โ”ƒ
โ”กโ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ•‡โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”ฉ
โ”‚ Total Time (HEAD)       โ”‚ 1093.28ms โ”‚
โ”‚ Total Time (prune-rg)   โ”‚ 1087.91ms โ”‚
โ”‚ Average Time (HEAD)     โ”‚   49.69ms โ”‚
โ”‚ Average Time (prune-rg) โ”‚   49.45ms โ”‚
โ”‚ Queries Faster          โ”‚         0 โ”‚
โ”‚ Queries Slower          โ”‚         2 โ”‚
โ”‚ Queries with No Change  โ”‚        20 โ”‚
โ”‚ Queries with Failure    โ”‚         0 โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

alamb avatar Jun 16 '25 22:06 alamb

๐Ÿค– ./gh_compare_branch.sh Benchmark Script Running Linux aal-dev 6.11.0-1015-gcp #15~24.04.1-Ubuntu SMP Thu Apr 24 20:41:05 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux Comparing prune-rg (936e039c84190e7345a5b4cff25d5e043c7b18d6) to dd936cb1b25cb685e0e146f297950eb00048c64c diff Benchmarks: clickbench_1 Results will be posted here when complete

alamb avatar Jun 16 '25 22:06 alamb

๐Ÿค–: Benchmark completed

Details

Comparing HEAD and prune-rg
--------------------
Benchmark clickbench_1.json
--------------------
โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”ณโ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”ณโ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”ณโ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”“
โ”ƒ Query        โ”ƒ        HEAD โ”ƒ    prune-rg โ”ƒ        Change โ”ƒ
โ”กโ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ•‡โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ•‡โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ•‡โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”ฉ
โ”‚ QQuery 0     โ”‚    48.55 ms โ”‚    48.75 ms โ”‚     no change โ”‚
โ”‚ QQuery 1     โ”‚    74.22 ms โ”‚    74.11 ms โ”‚     no change โ”‚
โ”‚ QQuery 2     โ”‚   109.42 ms โ”‚   109.88 ms โ”‚     no change โ”‚
โ”‚ QQuery 3     โ”‚   129.53 ms โ”‚   122.61 ms โ”‚ +1.06x faster โ”‚
โ”‚ QQuery 4     โ”‚   627.55 ms โ”‚   625.42 ms โ”‚     no change โ”‚
โ”‚ QQuery 5     โ”‚   849.86 ms โ”‚   849.16 ms โ”‚     no change โ”‚
โ”‚ QQuery 6     โ”‚    57.05 ms โ”‚    56.90 ms โ”‚     no change โ”‚
โ”‚ QQuery 7     โ”‚    80.49 ms โ”‚    82.69 ms โ”‚     no change โ”‚
โ”‚ QQuery 8     โ”‚   879.81 ms โ”‚   876.39 ms โ”‚     no change โ”‚
โ”‚ QQuery 9     โ”‚  1165.56 ms โ”‚  1167.48 ms โ”‚     no change โ”‚
โ”‚ QQuery 10    โ”‚   291.55 ms โ”‚   293.01 ms โ”‚     no change โ”‚
โ”‚ QQuery 11    โ”‚   318.77 ms โ”‚   322.61 ms โ”‚     no change โ”‚
โ”‚ QQuery 12    โ”‚   854.77 ms โ”‚   844.91 ms โ”‚     no change โ”‚
โ”‚ QQuery 13    โ”‚  1228.41 ms โ”‚  1205.05 ms โ”‚     no change โ”‚
โ”‚ QQuery 14    โ”‚   795.93 ms โ”‚   780.73 ms โ”‚     no change โ”‚
โ”‚ QQuery 15    โ”‚   809.83 ms โ”‚   797.08 ms โ”‚     no change โ”‚
โ”‚ QQuery 16    โ”‚  1624.25 ms โ”‚  1631.25 ms โ”‚     no change โ”‚
โ”‚ QQuery 17    โ”‚  1610.43 ms โ”‚  1592.79 ms โ”‚     no change โ”‚
โ”‚ QQuery 18    โ”‚  2880.16 ms โ”‚  2972.94 ms โ”‚     no change โ”‚
โ”‚ QQuery 19    โ”‚   126.17 ms โ”‚   122.33 ms โ”‚     no change โ”‚
โ”‚ QQuery 20    โ”‚  1168.29 ms โ”‚  1145.79 ms โ”‚     no change โ”‚
โ”‚ QQuery 21    โ”‚  1332.98 ms โ”‚  1326.28 ms โ”‚     no change โ”‚
โ”‚ QQuery 22    โ”‚  2301.13 ms โ”‚  2296.25 ms โ”‚     no change โ”‚
โ”‚ QQuery 23    โ”‚  7739.93 ms โ”‚  7786.37 ms โ”‚     no change โ”‚
โ”‚ QQuery 24    โ”‚   480.87 ms โ”‚   468.63 ms โ”‚     no change โ”‚
โ”‚ QQuery 25    โ”‚   407.47 ms โ”‚   407.81 ms โ”‚     no change โ”‚
โ”‚ QQuery 26    โ”‚   538.00 ms โ”‚   537.92 ms โ”‚     no change โ”‚
โ”‚ QQuery 27    โ”‚  1622.71 ms โ”‚  1634.71 ms โ”‚     no change โ”‚
โ”‚ QQuery 28    โ”‚ 12496.80 ms โ”‚ 12414.28 ms โ”‚     no change โ”‚
โ”‚ QQuery 29    โ”‚   555.02 ms โ”‚   572.82 ms โ”‚     no change โ”‚
โ”‚ QQuery 30    โ”‚   778.06 ms โ”‚   776.41 ms โ”‚     no change โ”‚
โ”‚ QQuery 31    โ”‚   851.43 ms โ”‚   834.85 ms โ”‚     no change โ”‚
โ”‚ QQuery 32    โ”‚  2531.83 ms โ”‚  2507.16 ms โ”‚     no change โ”‚
โ”‚ QQuery 33    โ”‚  3255.20 ms โ”‚  3232.13 ms โ”‚     no change โ”‚
โ”‚ QQuery 34    โ”‚  3300.27 ms โ”‚  3271.05 ms โ”‚     no change โ”‚
โ”‚ QQuery 35    โ”‚  1250.13 ms โ”‚  1217.49 ms โ”‚     no change โ”‚
โ”‚ QQuery 36    โ”‚   173.22 ms โ”‚   169.50 ms โ”‚     no change โ”‚
โ”‚ QQuery 37    โ”‚   101.32 ms โ”‚   101.13 ms โ”‚     no change โ”‚
โ”‚ QQuery 38    โ”‚   170.56 ms โ”‚   167.80 ms โ”‚     no change โ”‚
โ”‚ QQuery 39    โ”‚   251.74 ms โ”‚   251.91 ms โ”‚     no change โ”‚
โ”‚ QQuery 40    โ”‚    87.40 ms โ”‚    89.54 ms โ”‚     no change โ”‚
โ”‚ QQuery 41    โ”‚    86.72 ms โ”‚    84.18 ms โ”‚     no change โ”‚
โ”‚ QQuery 42    โ”‚    75.34 ms โ”‚    77.39 ms โ”‚     no change โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”ณโ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”“
โ”ƒ Benchmark Summary       โ”ƒ            โ”ƒ
โ”กโ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ•‡โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”ฉ
โ”‚ Total Time (HEAD)       โ”‚ 56118.72ms โ”‚
โ”‚ Total Time (prune-rg)   โ”‚ 55947.49ms โ”‚
โ”‚ Average Time (HEAD)     โ”‚  1305.09ms โ”‚
โ”‚ Average Time (prune-rg) โ”‚  1301.10ms โ”‚
โ”‚ Queries Faster          โ”‚          1 โ”‚
โ”‚ Queries Slower          โ”‚          0 โ”‚
โ”‚ Queries with No Change  โ”‚         42 โ”‚
โ”‚ Queries with Failure    โ”‚          0 โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

alamb avatar Jun 16 '25 22:06 alamb

Do we expect the benchmarks to show anything? I don't think they're using dynamic filters right? Maybe we need to merge #15770 and then we can benchmark this?

I want to make sure the overhead of checking the predicates on each incoming batch didn't slow things down

alamb avatar Jun 17 '25 11:06 alamb

Do we expect the benchmarks to show anything? I don't think they're using dynamic filters right? Maybe we need to merge #15770 and then we can benchmark this?

I want to make sure the overhead of checking the predicates on each incoming batch didn't slow things down

If you check the code that only happens if there are dynamic filters. And since there are non right now it becomes just a if let Some(file_pruner) = file_pruner.as_ref() check which is going to be too cheap to show up in benchmarks.

The only way to actually verify will be to merge https://github.com/apache/datafusion/pull/15770 and then compare this PR to main.

adriangb avatar Jun 17 '25 11:06 adriangb

@adriangb I'll review tomorrow, today have some other things

xudong963 avatar Jun 17 '25 12:06 xudong963

@alamb sorry for the ping but would you mind running topk_tpch on here?

adriangb avatar Jun 17 '25 18:06 adriangb

@alamb sorry for the ping but would you mind running topk_tpch on here?

LOL I need to make a webpage (or give you access to the sever to queue the jobs yourself)

alamb avatar Jun 17 '25 19:06 alamb

๐Ÿค– ./gh_compare_branch.sh Benchmark Script Running Linux aal-dev 6.11.0-1015-gcp #15~24.04.1-Ubuntu SMP Thu Apr 24 20:41:05 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux Comparing prune-rg (54b3bbf3f3a3c96162b7fb95a70f9a2657dbc7d3) to 1429c92474238a91a09f1cd4a68c19d03329b6a7 diff Benchmarks: topk_tpch Results will be posted here when complete

alamb avatar Jun 17 '25 19:06 alamb

@alamb sorry for the ping but would you mind running topk_tpch on here?

LOL I need to make a webpage (or give you access to the sever to queue the jobs yourself)

I was reading that Arrow has requested AWS credits https://lists.apache.org/thread/q33oofy2v3zpg9s9l8o0w68rmjr3ocsv . Perhaps we can utilize one of those for that use case.

Dandandan avatar Jun 17 '25 19:06 Dandandan

๐Ÿค–: Benchmark completed

Details

Comparing HEAD and prune-rg
--------------------
Benchmark run_topk_tpch.json
--------------------
โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”ณโ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”ณโ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”ณโ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”“
โ”ƒ Query        โ”ƒ      HEAD โ”ƒ  prune-rg โ”ƒ        Change โ”ƒ
โ”กโ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ•‡โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ•‡โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ•‡โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”ฉ
โ”‚ Q1           โ”‚  26.17 ms โ”‚  33.18 ms โ”‚  1.27x slower โ”‚
โ”‚ Q2           โ”‚  38.44 ms โ”‚  34.00 ms โ”‚ +1.13x faster โ”‚
โ”‚ Q3           โ”‚  97.05 ms โ”‚ 101.20 ms โ”‚     no change โ”‚
โ”‚ Q4           โ”‚  36.71 ms โ”‚  40.95 ms โ”‚  1.12x slower โ”‚
โ”‚ Q5           โ”‚  25.59 ms โ”‚  32.41 ms โ”‚  1.27x slower โ”‚
โ”‚ Q6           โ”‚  54.01 ms โ”‚  54.31 ms โ”‚     no change โ”‚
โ”‚ Q7           โ”‚ 146.60 ms โ”‚ 137.02 ms โ”‚ +1.07x faster โ”‚
โ”‚ Q8           โ”‚  79.27 ms โ”‚  88.55 ms โ”‚  1.12x slower โ”‚
โ”‚ Q9           โ”‚ 102.21 ms โ”‚ 112.97 ms โ”‚  1.11x slower โ”‚
โ”‚ Q10          โ”‚ 174.11 ms โ”‚ 188.49 ms โ”‚  1.08x slower โ”‚
โ”‚ Q11          โ”‚ 103.82 ms โ”‚  91.26 ms โ”‚ +1.14x faster โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”ณโ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”“
โ”ƒ Benchmark Summary       โ”ƒ          โ”ƒ
โ”กโ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ•‡โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”ฉ
โ”‚ Total Time (HEAD)       โ”‚ 883.98ms โ”‚
โ”‚ Total Time (prune-rg)   โ”‚ 914.34ms โ”‚
โ”‚ Average Time (HEAD)     โ”‚  80.36ms โ”‚
โ”‚ Average Time (prune-rg) โ”‚  83.12ms โ”‚
โ”‚ Queries Faster          โ”‚        3 โ”‚
โ”‚ Queries Slower          โ”‚        6 โ”‚
โ”‚ Queries with No Change  โ”‚        2 โ”‚
โ”‚ Queries with Failure    โ”‚        0 โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

alamb avatar Jun 17 '25 19:06 alamb

@alamb sorry for the ping but would you mind running topk_tpch on here?

LOL I need to make a webpage (or give you access to the sever to queue the jobs yourself)

I was reading that Arrow has requested / recieved AWS credits https://lists.apache.org/thread/q33oofy2v3zpg9s9l8o0w68rmjr3ocsv . Perhaps we can utilize one of those for that use case.

I tried to ask GCS for credits... they didn't seem excited and ultimately came up with nothing.

adriangb avatar Jun 17 '25 20:06 adriangb

๐Ÿค–: Benchmark completed

Details

Interesting results. I'm inclined to believe that the speedups and slowdowns are both real. We'll have to think about this a bit more.

adriangb avatar Jun 17 '25 20:06 adriangb

@Dandandan @alamb I pushed ebe4196 which adds a very cheap way to track changes to a PhysicalExpr if it's dynamic. I think this will be useful in several places but immediately it gives us the ability to check if the dynamic predicate has been updated before doing the work of re-calculating the pruning predicate, etc.

I'm still not sure it will be cheap enough, but I think it's worth a shot if we can re-run the benches.

It'll be a shame if we can't figure this out, I think if we are able to get this working it mostly negates the unfortunate situation right now that if you have a TopK it might be faster with less parallelism / partitioning upfront. With this change you still open the files but are able to quickly bail out as opposed to having to stream the whole thing.

adriangb avatar Jun 17 '25 21:06 adriangb

I think this will require @Dandandan 's suggestion of only updating the filters if the new ones are more selective: #16433.

Right now since we always update the filters -> it always bumps the generation -> we always re-check.

adriangb avatar Jun 17 '25 23:06 adriangb

@alamb I reverted the filtering during the stream so this should now do strictly less work ๐Ÿ˜„

adriangb avatar Jun 19 '25 20:06 adriangb

Also thank you @xudong963

alamb avatar Jun 24 '25 19:06 alamb

@alamb I added test assertions to confirm the stats are working correctly which addresses https://github.com/apache/datafusion/issues/16402

adriangb avatar Jun 25 '25 15:06 adriangb

@xudong963 @alamb I've re-organized this to incorporate https://github.com/apache/datafusion/pull/16549.

Sadly I did not catch in that PR that we were putting everything in lib.rs which I felt now is too bloated if I put FilePruner in there. So I moved PruningPredicate & co to pruning_predicate.rs - hence the huge diff line count.

I'll also point out that this now only does the extra work if it has either a dynamic filter OR the file has statistics already collected.

adriangb avatar Jun 26 '25 00:06 adriangb

Let's try and get this merged soon to avoid conflicts as much as possible

alamb avatar Jun 26 '25 15:06 alamb

Agreed. I'm struggling with the 3 failing tests because they fail in CI but I can't get them to fail locally...

adriangb avatar Jun 26 '25 15:06 adriangb

Something else we could potentially do is to do the refactor of pruning predicate into its own modules as a separate PR so it would be easier to find the mechanical from the algorithmic changes

alamb avatar Jun 26 '25 18:06 alamb

I feel like it must be a dumb mistake. Give me a bit of time to try to sort it out please. I'll deal with any conflicts.

adriangb avatar Jun 26 '25 19:06 adriangb

@alamb I've found the issue! The files_pruned_statistics metric is not actually the number of files: it is the number of times FileOpener::open was called which may be >1 per file if the file is split up into multiple ranges, which happens in the number of partitions > number of files! So it varies based on number of CPUs.

Options are:

  1. Rename the metric to reflect that it's actually pruning file opens not files, something like file_opens_pruned_statistics or file_ranges_pruned_statistics.
  2. Try to figure out a way to track the actual files pruned. I think this may be a dead end because e.g. a file may be half pruned (one range is scanned one range is pruned).

adriangb avatar Jun 27 '25 14:06 adriangb

What I've done for now is set the target partitions to 1. I think that's reasonable for these tests in general. I opened https://github.com/apache/datafusion/issues/16586 to track renaming the metric.

adriangb avatar Jun 27 '25 15:06 adriangb

@alamb to make this PR easier to review I opened https://github.com/apache/datafusion/pull/16587 which absorbs most of the diff. Once we merge that I'll rebase this and we can even discuss the metric rename here (the diff will be much more readable).

adriangb avatar Jun 27 '25 15:06 adriangb

Rebased and I renamed the metric + added documentation such that this now closes #16586

adriangb avatar Jun 27 '25 17:06 adriangb