datafusion add specialized InList implementations for common scalar types

Closes #18824

Nov 20 '25 00:11 adriangb

@alamb maybe let's run benchmarks?

Nov 20 '25 00:11 adriangb

Here's what I'm seeing so far:

Function & State	main (us)	specialized (us)	Change
in_list_utf8(5) (1024, 0) IN (1, 0)	3.6205	4.8594	Regressed (+34.239%)
in_list_utf8(10) (1024, 0) IN (1, 0)	3.6363	5.1001	Regressed (+40.312%)
in_list_utf8(20) (1024, 0) IN (1, 0)	3.6707	4.9045	Regressed (+33.600%)
in_list_f32 (1024, 0) IN (1, 0)	3.5255	3.2288	Improved (−8.2971%)
in_list_i32 (1024, 0) IN (1, 0)	3.4923	3.6403	Regressed (+3.8772%)
in_list_utf8(5) (1024, 0.2) IN (1, 0)	4.9284	6.3583	Regressed (+29.002%)
in_list_utf8(10) (1024, 0.2) IN (1, 0)	4.9788	6.3273	Regressed (+27.064%)
in_list_utf8(20) (1024, 0.2) IN (1, 0)	5.1145	6.4905	Regressed (+26.907%)
in_list_f32 (1024, 0.2) IN (1, 0)	4.4894	4.2540	Improved (−5.2512%)
in_list_i32 (1024, 0.2) IN (1, 0)	4.4055	4.2869	Improved (−2.7526%)
in_list_utf8(5) (1024, 0) IN (3, 0)	3.5576	4.8733	Regressed (+37.000%)
in_list_utf8(10) (1024, 0) IN (3, 0)	3.4999	4.8737	Regressed (+37.108%)
in_list_utf8(20) (1024, 0) IN (3, 0)	3.4897	4.9039	Regressed (+40.449%)
in_list_f32 (1024, 0) IN (3, 0)	3.5114	3.2550	Improved (−7.3270%)
in_list_i32 (1024, 0) IN (3, 0)	3.5199	3.4879	Noise Threshold (−0.9158%)
in_list_utf8(5) (1024, 0.2) IN (3, 0)	5.0196	6.2232	Regressed (+23.865%)
in_list_utf8(10) (1024, 0.2) IN (3, 0)	5.0453	6.5280	Regressed (+28.004%)
in_list_utf8(20) (1024, 0.2) IN (3, 0)	5.1005	11.124	Regressed (+118.13%)
in_list_f32 (1024, 0.2) IN (3, 0)	4.5049	4.3126	Improved (−4.6367%)
in_list_i32 (1024, 0.2) IN (3, 0)	4.4026	4.4424	Noise Threshold (+1.0449%)
in_list_utf8(5) (1024, 0) IN (10, 0)	3.5275	4.9709	Regressed (+40.969%)
in_list_utf8(10) (1024, 0) IN (10, 0)	3.5613	4.9125	Regressed (+37.386%)
in_list_utf8(20) (1024, 0) IN (10, 0)	3.5589	4.8611	Regressed (+36.578%)
in_list_f32 (1024, 0) IN (10, 0)	3.5596	3.2017	Improved (−10.003%)
in_list_i32 (1024, 0) IN (10, 0)	3.4917	3.5190	Noise Threshold (+1.0216%)
in_list_utf8(5) (1024, 0.2) IN (10, 0)	5.0368	6.6162	Regressed (+31.725%)
in_list_utf8(10) (1024, 0.2) IN (10, 0)	5.0980	6.6660	Regressed (+30.039%)
in_list_utf8(20) (1024, 0.2) IN (10, 0)	5.2543	6.5350	Regressed (+24.335%)
in_list_f32 (1024, 0.2) IN (10, 0)	4.5609	4.3248	Improved (−5.3746%)
in_list_i32 (1024, 0.2) IN (10, 0)	4.4460	4.3543	Improved (−2.7754%)
in_list_utf8(5) (1024, 0) IN (100, 0)	3.6542	4.9992	Regressed (+36.952%)
in_list_utf8(10) (1024, 0) IN (100, 0)	3.6529	4.8772	Regressed (+33.560%)
in_list_utf8(20) (1024, 0) IN (100, 0)	3.6155	5.0250	Regressed (+39.127%)
in_list_f32 (1024, 0) IN (100, 0)	3.6029	3.2448	Improved (−9.7603%)
in_list_i32 (1024, 0) IN (100, 0)	3.6048	3.4770	Improved (−3.5307%)
in_list_utf8(5) (1024, 0.2) IN (100, 0)	5.3988	6.6108	Regressed (+22.866%)
in_list_utf8(10) (1024, 0.2) IN (100, 0)	5.4776	6.6567	Regressed (+21.591%)
in_list_utf8(20) (1024, 0.2) IN (100, 0)	5.3721	6.6470	Regressed (+24.058%)
in_list_f32 (1024, 0.2) IN (100, 0)	4.7693	4.3281	Improved (−9.9814%)
in_list_i32 (1024, 0.2) IN (100, 0)	4.5402	4.4076	Improved (−3.5247%)

I think we'd need to add benchmarks for other primitive types. And it's interesting that Utf8 regresses a lot across the board. I guess the benefits of vectorization / computing the entire hashes at once outweighs the dynamic dispatch?

Nov 20 '25 07:11 adriangb

Thinking about it the trick is probably to avoid the extra to_string()/to_vec() allocation and store just the hashes. Will look into that.

Nov 20 '25 08:11 adriangb

🤖 ./gh_compare_branch.sh Benchmark Script Running Linux aal-dev 6.14.0-1018-gcp #19~24.04.1-Ubuntu SMP Wed Sep 24 23:23:09 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux Comparing specialize (1e4782f1cde36a83398002a4730ff96123b547a3) to 7d8b8602ad1be2f61f6a8ebb253ace9d85304ea7 diff using: tpch_mem clickbench_partitioned clickbench_extended Results will be posted here when complete

Nov 22 '25 10:11 alamb

🤖: Benchmark completed

Details

Comparing HEAD and specialize
--------------------
Benchmark clickbench_extended.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┓
┃ Query        ┃        HEAD ┃  specialize ┃       Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━┩
│ QQuery 0     │  2653.29 ms │  2643.63 ms │    no change │
│ QQuery 1     │  1306.60 ms │  1281.59 ms │    no change │
│ QQuery 2     │  2455.20 ms │  2370.32 ms │    no change │
│ QQuery 3     │  1156.08 ms │  1114.79 ms │    no change │
│ QQuery 4     │  2330.46 ms │  2304.88 ms │    no change │
│ QQuery 5     │ 28126.46 ms │ 28010.99 ms │    no change │
│ QQuery 6     │  4067.69 ms │  4060.48 ms │    no change │
│ QQuery 7     │  3638.82 ms │  3851.00 ms │ 1.06x slower │
└──────────────┴─────────────┴─────────────┴──────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary         ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)         │ 45734.60ms │
│ Total Time (specialize)   │ 45637.68ms │
│ Average Time (HEAD)       │  5716.82ms │
│ Average Time (specialize) │  5704.71ms │
│ Queries Faster            │          0 │
│ Queries Slower            │          1 │
│ Queries with No Change    │          7 │
│ Queries with Failure      │          0 │
└───────────────────────────┴────────────┘
--------------------
Benchmark clickbench_partitioned.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃        HEAD ┃  specialize ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 0     │     2.51 ms │     2.23 ms │ +1.13x faster │
│ QQuery 1     │    51.64 ms │    50.17 ms │     no change │
│ QQuery 2     │   136.71 ms │   138.99 ms │     no change │
│ QQuery 3     │   164.21 ms │   165.13 ms │     no change │
│ QQuery 4     │  1070.98 ms │  1088.91 ms │     no change │
│ QQuery 5     │  1510.36 ms │  1485.90 ms │     no change │
│ QQuery 6     │     2.17 ms │     2.20 ms │     no change │
│ QQuery 7     │    55.51 ms │    55.89 ms │     no change │
│ QQuery 8     │  1426.35 ms │  1446.26 ms │     no change │
│ QQuery 9     │  1872.18 ms │  1829.87 ms │     no change │
│ QQuery 10    │   380.85 ms │   367.57 ms │     no change │
│ QQuery 11    │   458.71 ms │   417.23 ms │ +1.10x faster │
│ QQuery 12    │  1408.78 ms │  1382.73 ms │     no change │
│ QQuery 13    │  2081.78 ms │  2148.16 ms │     no change │
│ QQuery 14    │  1242.39 ms │  1286.95 ms │     no change │
│ QQuery 15    │  1216.37 ms │  1262.77 ms │     no change │
│ QQuery 16    │  2679.04 ms │  2730.67 ms │     no change │
│ QQuery 17    │  2657.52 ms │  2666.04 ms │     no change │
│ QQuery 18    │  5347.62 ms │  4996.68 ms │ +1.07x faster │
│ QQuery 19    │   129.19 ms │   126.78 ms │     no change │
│ QQuery 20    │  2020.18 ms │  1974.03 ms │     no change │
│ QQuery 21    │  2330.14 ms │  2291.15 ms │     no change │
│ QQuery 22    │  3990.08 ms │  3921.02 ms │     no change │
│ QQuery 23    │ 12864.85 ms │ 12678.44 ms │     no change │
│ QQuery 24    │   214.91 ms │   197.85 ms │ +1.09x faster │
│ QQuery 25    │   477.06 ms │   465.64 ms │     no change │
│ QQuery 26    │   221.72 ms │   211.14 ms │     no change │
│ QQuery 27    │  2818.62 ms │  2800.08 ms │     no change │
│ QQuery 28    │ 23386.04 ms │ 24032.36 ms │     no change │
│ QQuery 29    │   963.31 ms │   964.57 ms │     no change │
│ QQuery 30    │  1318.41 ms │  1329.87 ms │     no change │
│ QQuery 31    │  1378.81 ms │  1352.16 ms │     no change │
│ QQuery 32    │  4881.46 ms │  5040.19 ms │     no change │
│ QQuery 33    │  5905.32 ms │  5981.60 ms │     no change │
│ QQuery 34    │  6119.86 ms │  6348.76 ms │     no change │
│ QQuery 35    │  1942.66 ms │  1904.86 ms │     no change │
│ QQuery 36    │   120.95 ms │   120.81 ms │     no change │
│ QQuery 37    │    52.95 ms │    53.06 ms │     no change │
│ QQuery 38    │   121.94 ms │   120.66 ms │     no change │
│ QQuery 39    │   196.99 ms │   198.57 ms │     no change │
│ QQuery 40    │    43.90 ms │    41.87 ms │     no change │
│ QQuery 41    │    39.75 ms │    37.46 ms │ +1.06x faster │
│ QQuery 42    │    32.87 ms │    31.87 ms │     no change │
└──────────────┴─────────────┴─────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary         ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)         │ 95337.65ms │
│ Total Time (specialize)   │ 95749.15ms │
│ Average Time (HEAD)       │  2217.15ms │
│ Average Time (specialize) │  2226.72ms │
│ Queries Faster            │          5 │
│ Queries Slower            │          0 │
│ Queries with No Change    │         38 │
│ Queries with Failure      │          0 │
└───────────────────────────┴────────────┘
--------------------
Benchmark tpch_mem_sf1.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃      HEAD ┃ specialize ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 1     │ 139.55 ms │  138.68 ms │     no change │
│ QQuery 2     │  27.47 ms │   28.88 ms │  1.05x slower │
│ QQuery 3     │  38.06 ms │   37.65 ms │     no change │
│ QQuery 4     │  29.40 ms │   28.38 ms │     no change │
│ QQuery 5     │  87.81 ms │   86.86 ms │     no change │
│ QQuery 6     │  19.52 ms │   19.58 ms │     no change │
│ QQuery 7     │ 222.64 ms │  227.15 ms │     no change │
│ QQuery 8     │  34.19 ms │   35.26 ms │     no change │
│ QQuery 9     │ 102.37 ms │  105.70 ms │     no change │
│ QQuery 10    │  62.63 ms │   64.92 ms │     no change │
│ QQuery 11    │  18.73 ms │   18.59 ms │     no change │
│ QQuery 12    │  51.03 ms │   52.34 ms │     no change │
│ QQuery 13    │  48.65 ms │   48.66 ms │     no change │
│ QQuery 14    │  13.94 ms │   13.77 ms │     no change │
│ QQuery 15    │  25.10 ms │   24.91 ms │     no change │
│ QQuery 16    │  25.28 ms │   24.76 ms │     no change │
│ QQuery 17    │ 153.69 ms │  150.64 ms │     no change │
│ QQuery 18    │ 277.48 ms │  281.61 ms │     no change │
│ QQuery 19    │  36.61 ms │   36.42 ms │     no change │
│ QQuery 20    │  50.79 ms │   48.40 ms │     no change │
│ QQuery 21    │ 338.03 ms │  318.12 ms │ +1.06x faster │
│ QQuery 22    │  20.42 ms │   17.44 ms │ +1.17x faster │
└──────────────┴───────────┴────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┓
┃ Benchmark Summary         ┃           ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━┩
│ Total Time (HEAD)         │ 1823.38ms │
│ Total Time (specialize)   │ 1808.75ms │
│ Average Time (HEAD)       │   82.88ms │
│ Average Time (specialize) │   82.22ms │
│ Queries Faster            │         2 │
│ Queries Slower            │         1 │
│ Queries with No Change    │        19 │
│ Queries with Failure      │         0 │
└───────────────────────────┴───────────┘

Nov 22 '25 11:11 alamb

🤖 ./gh_compare_branch_bench.sh Benchmark Script Running Linux aal-dev 6.14.0-1018-gcp #19~24.04.1-Ubuntu SMP Wed Sep 24 23:23:09 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux Comparing specialize (1e4782f1cde36a83398002a4730ff96123b547a3) to 7d8b8602ad1be2f61f6a8ebb253ace9d85304ea7 diff BENCH_NAME=in_list BENCH_COMMAND=cargo bench --bench in_list BENCH_FILTER= BENCH_BRANCH_NAME=specialize Results will be posted here when complete

Nov 22 '25 12:11 alamb

🤖: Benchmark completed

Details

group                                       main                                   specialize
-----                                       ----                                   ----------
in_list_f32 (1024, 0) IN (1, 0)             1.16      4.2±0.01µs        ? ?/sec    1.00      3.6±0.02µs        ? ?/sec
in_list_f32 (1024, 0) IN (10, 0)            1.16      4.2±0.01µs        ? ?/sec    1.00      3.6±0.02µs        ? ?/sec
in_list_f32 (1024, 0) IN (100, 0)           1.16      4.3±0.01µs        ? ?/sec    1.00      3.7±0.01µs        ? ?/sec
in_list_f32 (1024, 0) IN (3, 0)             1.17      4.3±0.10µs        ? ?/sec    1.00      3.6±0.01µs        ? ?/sec
in_list_f32 (1024, 0.2) IN (1, 0)           1.00      5.1±0.02µs        ? ?/sec    1.13      5.8±0.07µs        ? ?/sec
in_list_f32 (1024, 0.2) IN (10, 0)          1.00      5.3±0.03µs        ? ?/sec    1.09      5.7±0.02µs        ? ?/sec
in_list_f32 (1024, 0.2) IN (100, 0)         1.00      5.3±0.04µs        ? ?/sec    1.13      5.9±0.10µs        ? ?/sec
in_list_f32 (1024, 0.2) IN (3, 0)           1.00      5.2±0.04µs        ? ?/sec    1.12      5.8±0.05µs        ? ?/sec
in_list_i32 (1024, 0) IN (1, 0)             1.00      4.3±0.01µs        ? ?/sec    1.11      4.7±0.02µs        ? ?/sec
in_list_i32 (1024, 0) IN (10, 0)            1.00      4.2±0.01µs        ? ?/sec    1.11      4.7±0.04µs        ? ?/sec
in_list_i32 (1024, 0) IN (100, 0)           1.00      4.3±0.01µs        ? ?/sec    1.11      4.7±0.01µs        ? ?/sec
in_list_i32 (1024, 0) IN (3, 0)             1.00      4.3±0.01µs        ? ?/sec    1.11      4.7±0.07µs        ? ?/sec
in_list_i32 (1024, 0.2) IN (1, 0)           1.00      5.2±0.02µs        ? ?/sec    1.32      6.8±0.39µs        ? ?/sec
in_list_i32 (1024, 0.2) IN (10, 0)          1.00      5.2±0.02µs        ? ?/sec    1.25      6.5±0.03µs        ? ?/sec
in_list_i32 (1024, 0.2) IN (100, 0)         1.00      5.2±0.03µs        ? ?/sec    1.16      6.1±0.06µs        ? ?/sec
in_list_i32 (1024, 0.2) IN (3, 0)           1.00      5.2±0.02µs        ? ?/sec    1.30      6.8±0.03µs        ? ?/sec
in_list_utf8(10) (1024, 0) IN (1, 0)        1.16      4.5±0.02µs        ? ?/sec    1.00      3.9±0.01µs        ? ?/sec
in_list_utf8(10) (1024, 0) IN (10, 0)       1.16      4.5±0.01µs        ? ?/sec    1.00      3.9±0.01µs        ? ?/sec
in_list_utf8(10) (1024, 0) IN (100, 0)      1.16      4.5±0.03µs        ? ?/sec    1.00      3.9±0.01µs        ? ?/sec
in_list_utf8(10) (1024, 0) IN (3, 0)        1.16      4.5±0.04µs        ? ?/sec    1.00      3.9±0.01µs        ? ?/sec
in_list_utf8(10) (1024, 0.2) IN (1, 0)      1.06      5.9±0.03µs        ? ?/sec    1.00      5.6±0.04µs        ? ?/sec
in_list_utf8(10) (1024, 0.2) IN (10, 0)     1.09      6.0±0.06µs        ? ?/sec    1.00      5.5±0.03µs        ? ?/sec
in_list_utf8(10) (1024, 0.2) IN (100, 0)    1.07      6.0±0.03µs        ? ?/sec    1.00      5.6±0.03µs        ? ?/sec
in_list_utf8(10) (1024, 0.2) IN (3, 0)      1.10      6.1±0.04µs        ? ?/sec    1.00      5.6±0.03µs        ? ?/sec
in_list_utf8(20) (1024, 0) IN (1, 0)        1.16      4.5±0.02µs        ? ?/sec    1.00      3.9±0.01µs        ? ?/sec
in_list_utf8(20) (1024, 0) IN (10, 0)       1.16      4.5±0.01µs        ? ?/sec    1.00      3.9±0.01µs        ? ?/sec
in_list_utf8(20) (1024, 0) IN (100, 0)      1.15      4.5±0.04µs        ? ?/sec    1.00      3.9±0.03µs        ? ?/sec
in_list_utf8(20) (1024, 0) IN (3, 0)        1.14      4.6±0.01µs        ? ?/sec    1.00      4.0±0.03µs        ? ?/sec
in_list_utf8(20) (1024, 0.2) IN (1, 0)      1.06      6.0±0.02µs        ? ?/sec    1.00      5.7±0.03µs        ? ?/sec
in_list_utf8(20) (1024, 0.2) IN (10, 0)     1.07      6.0±0.03µs        ? ?/sec    1.00      5.6±0.03µs        ? ?/sec
in_list_utf8(20) (1024, 0.2) IN (100, 0)    1.09      6.1±0.03µs        ? ?/sec    1.00      5.6±0.02µs        ? ?/sec
in_list_utf8(20) (1024, 0.2) IN (3, 0)      1.04      6.1±0.03µs        ? ?/sec    1.00      5.8±0.05µs        ? ?/sec
in_list_utf8(5) (1024, 0) IN (1, 0)         1.15      4.5±0.01µs        ? ?/sec    1.00      4.0±0.01µs        ? ?/sec
in_list_utf8(5) (1024, 0) IN (10, 0)        1.16      4.5±0.03µs        ? ?/sec    1.00      3.9±0.01µs        ? ?/sec
in_list_utf8(5) (1024, 0) IN (100, 0)       1.16      4.5±0.01µs        ? ?/sec    1.00      3.9±0.00µs        ? ?/sec
in_list_utf8(5) (1024, 0) IN (3, 0)         1.14      4.5±0.03µs        ? ?/sec    1.00      4.0±0.02µs        ? ?/sec
in_list_utf8(5) (1024, 0.2) IN (1, 0)       1.06      5.9±0.04µs        ? ?/sec    1.00      5.5±0.04µs        ? ?/sec
in_list_utf8(5) (1024, 0.2) IN (10, 0)      1.10      6.1±0.04µs        ? ?/sec    1.00      5.5±0.03µs        ? ?/sec
in_list_utf8(5) (1024, 0.2) IN (100, 0)     1.11      6.0±0.03µs        ? ?/sec    1.00      5.4±0.03µs        ? ?/sec
in_list_utf8(5) (1024, 0.2) IN (3, 0)       1.11      6.1±0.06µs        ? ?/sec    1.00      5.5±0.04µs        ? ?/sec

Nov 22 '25 12:11 alamb

🤖: Benchmark completed

Details

group                                       main                                   specialize
-----                                       ----                                   ----------
in_list_f32 (1024, 0) IN (1, 0)             1.16      4.2±0.01µs        ? ?/sec    1.00      3.6±0.02µs        ? ?/sec
in_list_f32 (1024, 0) IN (10, 0)            1.16      4.2±0.01µs        ? ?/sec    1.00      3.6±0.02µs        ? ?/sec
in_list_f32 (1024, 0) IN (100, 0)           1.16      4.3±0.01µs        ? ?/sec    1.00      3.7±0.01µs        ? ?/sec
in_list_f32 (1024, 0) IN (3, 0)             1.17      4.3±0.10µs        ? ?/sec    1.00      3.6±0.01µs        ? ?/sec
in_list_f32 (1024, 0.2) IN (1, 0)           1.00      5.1±0.02µs        ? ?/sec    1.13      5.8±0.07µs        ? ?/sec
in_list_f32 (1024, 0.2) IN (10, 0)          1.00      5.3±0.03µs        ? ?/sec    1.09      5.7±0.02µs        ? ?/sec
in_list_f32 (1024, 0.2) IN (100, 0)         1.00      5.3±0.04µs        ? ?/sec    1.13      5.9±0.10µs        ? ?/sec
in_list_f32 (1024, 0.2) IN (3, 0)           1.00      5.2±0.04µs        ? ?/sec    1.12      5.8±0.05µs        ? ?/sec
in_list_i32 (1024, 0) IN (1, 0)             1.00      4.3±0.01µs        ? ?/sec    1.11      4.7±0.02µs        ? ?/sec
in_list_i32 (1024, 0) IN (10, 0)            1.00      4.2±0.01µs        ? ?/sec    1.11      4.7±0.04µs        ? ?/sec
in_list_i32 (1024, 0) IN (100, 0)           1.00      4.3±0.01µs        ? ?/sec    1.11      4.7±0.01µs        ? ?/sec
in_list_i32 (1024, 0) IN (3, 0)             1.00      4.3±0.01µs        ? ?/sec    1.11      4.7±0.07µs        ? ?/sec
in_list_i32 (1024, 0.2) IN (1, 0)           1.00      5.2±0.02µs        ? ?/sec    1.32      6.8±0.39µs        ? ?/sec
in_list_i32 (1024, 0.2) IN (10, 0)          1.00      5.2±0.02µs        ? ?/sec    1.25      6.5±0.03µs        ? ?/sec
in_list_i32 (1024, 0.2) IN (100, 0)         1.00      5.2±0.03µs        ? ?/sec    1.16      6.1±0.06µs        ? ?/sec
in_list_i32 (1024, 0.2) IN (3, 0)           1.00      5.2±0.02µs        ? ?/sec    1.30      6.8±0.03µs        ? ?/sec
in_list_utf8(10) (1024, 0) IN (1, 0)        1.16      4.5±0.02µs        ? ?/sec    1.00      3.9±0.01µs        ? ?/sec
in_list_utf8(10) (1024, 0) IN (10, 0)       1.16      4.5±0.01µs        ? ?/sec    1.00      3.9±0.01µs        ? ?/sec
in_list_utf8(10) (1024, 0) IN (100, 0)      1.16      4.5±0.03µs        ? ?/sec    1.00      3.9±0.01µs        ? ?/sec
in_list_utf8(10) (1024, 0) IN (3, 0)        1.16      4.5±0.04µs        ? ?/sec    1.00      3.9±0.01µs        ? ?/sec
in_list_utf8(10) (1024, 0.2) IN (1, 0)      1.06      5.9±0.03µs        ? ?/sec    1.00      5.6±0.04µs        ? ?/sec
in_list_utf8(10) (1024, 0.2) IN (10, 0)     1.09      6.0±0.06µs        ? ?/sec    1.00      5.5±0.03µs        ? ?/sec
in_list_utf8(10) (1024, 0.2) IN (100, 0)    1.07      6.0±0.03µs        ? ?/sec    1.00      5.6±0.03µs        ? ?/sec
in_list_utf8(10) (1024, 0.2) IN (3, 0)      1.10      6.1±0.04µs        ? ?/sec    1.00      5.6±0.03µs        ? ?/sec
in_list_utf8(20) (1024, 0) IN (1, 0)        1.16      4.5±0.02µs        ? ?/sec    1.00      3.9±0.01µs        ? ?/sec
in_list_utf8(20) (1024, 0) IN (10, 0)       1.16      4.5±0.01µs        ? ?/sec    1.00      3.9±0.01µs        ? ?/sec
in_list_utf8(20) (1024, 0) IN (100, 0)      1.15      4.5±0.04µs        ? ?/sec    1.00      3.9±0.03µs        ? ?/sec
in_list_utf8(20) (1024, 0) IN (3, 0)        1.14      4.6±0.01µs        ? ?/sec    1.00      4.0±0.03µs        ? ?/sec
in_list_utf8(20) (1024, 0.2) IN (1, 0)      1.06      6.0±0.02µs        ? ?/sec    1.00      5.7±0.03µs        ? ?/sec
in_list_utf8(20) (1024, 0.2) IN (10, 0)     1.07      6.0±0.03µs        ? ?/sec    1.00      5.6±0.03µs        ? ?/sec
in_list_utf8(20) (1024, 0.2) IN (100, 0)    1.09      6.1±0.03µs        ? ?/sec    1.00      5.6±0.02µs        ? ?/sec
in_list_utf8(20) (1024, 0.2) IN (3, 0)      1.04      6.1±0.03µs        ? ?/sec    1.00      5.8±0.05µs        ? ?/sec
in_list_utf8(5) (1024, 0) IN (1, 0)         1.15      4.5±0.01µs        ? ?/sec    1.00      4.0±0.01µs        ? ?/sec
in_list_utf8(5) (1024, 0) IN (10, 0)        1.16      4.5±0.03µs        ? ?/sec    1.00      3.9±0.01µs        ? ?/sec
in_list_utf8(5) (1024, 0) IN (100, 0)       1.16      4.5±0.01µs        ? ?/sec    1.00      3.9±0.00µs        ? ?/sec
in_list_utf8(5) (1024, 0) IN (3, 0)         1.14      4.5±0.03µs        ? ?/sec    1.00      4.0±0.02µs        ? ?/sec
in_list_utf8(5) (1024, 0.2) IN (1, 0)       1.06      5.9±0.04µs        ? ?/sec    1.00      5.5±0.04µs        ? ?/sec
in_list_utf8(5) (1024, 0.2) IN (10, 0)      1.10      6.1±0.04µs        ? ?/sec    1.00      5.5±0.03µs        ? ?/sec
in_list_utf8(5) (1024, 0.2) IN (100, 0)     1.11      6.0±0.03µs        ? ?/sec    1.00      5.4±0.03µs        ? ?/sec
in_list_utf8(5) (1024, 0.2) IN (3, 0)       1.11      6.1±0.06µs        ? ?/sec    1.00      5.5±0.04µs        ? ?/sec

Results seem a bit mixed?

Nov 23 '25 06:11 Dandandan

Yes. Slowdowns for i32 are concerning. I won’t merge this until it’s all speedups or neutral. I may also make a support PR to add more benchmarks for other types so we can make better comparisons.

Nov 23 '25 06:11 adriangb

I am hoping to look at this PR today or tomorrow and help push it along

Dec 02 '25 12:12 alamb

I realized this new implementation differs from the existing one in ways which fix some bugs. I opened https://github.com/apache/datafusion/pull/19050 to demonstrate those bugs. So maybe lets first fix the bugs and merge the new tests, then we can continue here?

Dec 02 '25 18:12 adriangb

General comment on the benchmark but... am I reading them wrong, or is the null_percent input logic inverted?

fn do_benches(
    c: &mut Criterion,
    array_length: usize,
    in_list_length: usize,
    null_percent: f64,
) {
...
    let values: Int32Array = (0..array_length)
        .map(|_| rng.random_bool(null_percent).then(|| rng.random()))
        .collect();

    let in_list: Vec<_> = (0..in_list_length)
        .map(|_| ScalarValue::Int32(Some(rng.random())))
        .collect();

    do_bench(
        c,
        &format!("in_list_i32 ({array_length}, {null_percent}) IN ({in_list_length}, 0)"),
        Arc::new(values),
        &in_list,
    )

When null_percent is 0, won't this just create a array_length long values array of NULLs?

I think it should rather be something like:

fn do_benches(
    c: &mut Criterion,
    array_length: usize,
    in_list_length: usize,
    null_percent: f64,
) {
    let non_null_percent = 1.0 - null_percent;
...
    let values: Int32Array = (0..array_length)
        .map(|_| rng.random_bool(non_null_percent).then(|| rng.random()))
        .collect();

    let in_list: Vec<_> = (0..in_list_length)
        .map(|_| ScalarValue::Int32(Some(rng.random())))
        .collect();

    do_bench(
        c,
        &format!("in_list_i32 ({array_length}, {null_percent}) IN ({in_list_length}, 0)"),
        Arc::new(values),
        &in_list,
    )

EDIT: I opened https://github.com/apache/datafusion/pull/19204 to fix this.

Dec 08 '25 09:12 geoffreyclaude

Another general comment, on the implementation this time: hashing seems overkill and probably overly expensive for small simple type lists.

@adriangb have you considered sorting the InList and doing a binary search?

For small lists (threshold to be refined...) this could be orders of magnitude faster that the overhead of hashing. To validate with the new fixed benchmarks of course!

Dec 08 '25 09:12 geoffreyclaude

Another general comment, on the implementation this time: hashing seems overkill and probably overly expensive for small simple type lists.

@adriangb have you considered sorting the InList and doing a binary search?

For small lists (threshold to be refined...) this could be orders of magnitude faster that the overhead of hashing. To validate with the new fixed benchmarks of course!

I think that's a really neat idea! I did look into using binary search at some point (not this PR) but then realized that there was already the hashing in place and didn't pursue it further. So while I agree it could be interesting to have two approaches, I think we should focus on improving the current approach right now especially since we probably degraded performance since last release. Maybe let's file a ticket to track benchmarking that approach?

Dec 08 '25 17:12 adriangb

Another general comment, on the implementation this time: hashing seems overkill and probably overly expensive for small simple type lists. @adriangb have you considered sorting the InList and doing a binary search? For small lists (threshold to be refined...) this could be orders of magnitude faster that the overhead of hashing. To validate with the new fixed benchmarks of course!

I think that's a really neat idea! I did look into using binary search at some point (not this PR) but then realized that there was already the hashing in place and didn't pursue it further. So while I agree it could be interesting to have two approaches, I think we should focus on improving the current approach right now especially since we probably degraded performance since last release. Maybe let's file a ticket to track benchmarking that approach?

This was too much fun to skip the opportunity! I've opened a draft PR on your branch that shows pretty cool results: https://github.com/pydantic/datafusion/pull/46

Dec 08 '25 18:12 geoffreyclaude

run benchmarks in_list

Dec 09 '25 21:12 adriangb

🤖 Hi @adriangb, thanks for the request (https://github.com/apache/datafusion/pull/18832#issuecomment-3634442805).

scrape_comments.py only supports whitelisted benchmarks.

Standard: clickbench_1, clickbench_extended, clickbench_partitioned, clickbench_pushdown, tpch, tpch10, tpch_mem, tpch_mem10
Criterion: aggregate_query_sql, aggregate_vectorized, case_when, in_list, sql_planner

Please choose one or more of these with run benchmark <name> or run benchmark <name1> <name2>...

Dec 09 '25 21:12 alamb-ghbot

~ hey real @alamb could you whitelist in_list for me (might as well do Daniel and Geoffrey at the same time) ~ nvm user error

Dec 09 '25 22:12 adriangb

run benchmark in_list

Dec 09 '25 22:12 adriangb

🤖 ./gh_compare_branch_bench.sh compare_branch_bench.sh Running Linux aal-dev 6.14.0-1018-gcp #19~24.04.1-Ubuntu SMP Wed Sep 24 23:23:09 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux Comparing specialize (d11d7ae6d79acc1cc4e815fd01e37bce6afae0ad) to e8a0829b7a2044ab477784d60ed2d208b97f5042 diff BENCH_NAME=in_list BENCH_COMMAND=cargo bench --all-features --bench in_list BENCH_FILTER= BENCH_BRANCH_NAME=specialize Results will be posted here when complete

Dec 09 '25 22:12 alamb-ghbot

~ hey real @alamb could you whitelist in_list for me (might as well do Daniel and Geoffrey at the same time) ~ nvm user error

@Dandandan is already on the list https://github.com/alamb/datafusion-benchmarking/blob/4fb120785fa66ecbf40a45d8a5d0d5f4be17266a/scripts/scrape_comments.py#L41

I can add @geoffreyclaude if he would like

Dec 09 '25 22:12 alamb

🤖 Hi @adriangb, thanks for the request (https://github.com/apache/datafusion/pull/18832#issuecomment-3634468395).

scrape_comments.py only supports whitelisted benchmarks.

Standard: clickbench_1, clickbench_extended, clickbench_partitioned, clickbench_pushdown, tpch, tpch10, tpch_mem, tpch_mem10
Criterion: aggregate_query_sql, aggregate_vectorized, case_when, in_list, sql_planner

Please choose one or more of these with run benchmark <name> or run benchmark <name1> <name2>...

Dec 09 '25 22:12 alamb-ghbot

run benchmarks in_list

Dec 09 '25 22:12 alamb

🤖 Hi @alamb, thanks for the request (https://github.com/apache/datafusion/pull/18832#issuecomment-3634537320).

scrape_comments.py only supports whitelisted benchmarks.

Standard: clickbench_1, clickbench_extended, clickbench_partitioned, clickbench_pushdown, tpch, tpch10, tpch_mem, tpch_mem10
Criterion: aggregate_query_sql, aggregate_vectorized, case_when, in_list, sql_planner

Please choose one or more of these with run benchmark <name> or run benchmark <name1> <name2>...

Dec 09 '25 22:12 alamb-ghbot

The key is benchmark vs benchmarks fyi. I can make a PR to accept both tomorrow.

Dec 09 '25 22:12 adriangb

(I fixed the script to support

The key is benchmark vs benchmarks fyi. I can make a PR to accept both tomorrow.

Yeah, I thought I could bash it out quickly but sadly it took more than 30 seconds

Dec 09 '25 22:12 alamb

🤖: Benchmark completed

Details

group                                          main                                   specialize
-----                                          ----                                   ----------
in_list/Float32/list=100/nulls=0%              1.01    307.1±8.23µs        ? ?/sec    1.00    302.7±1.87µs        ? ?/sec
in_list/Float32/list=100/nulls=20%             1.04    391.1±2.13µs        ? ?/sec    1.00    374.4±4.60µs        ? ?/sec
in_list/Float32/list=3/nulls=0%                1.03     14.6±0.17µs        ? ?/sec    1.00     14.2±0.11µs        ? ?/sec
in_list/Float32/list=3/nulls=20%               1.00     19.7±0.15µs        ? ?/sec    1.00     19.7±0.13µs        ? ?/sec
in_list/Float32/list=8/nulls=0%                1.00     29.1±0.36µs        ? ?/sec    1.00     29.2±1.14µs        ? ?/sec
in_list/Float32/list=8/nulls=20%               1.07     39.4±0.27µs        ? ?/sec    1.00     36.8±0.47µs        ? ?/sec
in_list/Int32/list=100/nulls=0%                1.00      5.9±0.13µs        ? ?/sec    1.06      6.2±0.04µs        ? ?/sec
in_list/Int32/list=100/nulls=20%               1.06      8.4±0.08µs        ? ?/sec    1.00      7.9±0.11µs        ? ?/sec
in_list/Int32/list=3/nulls=0%                  1.00      5.6±0.06µs        ? ?/sec    1.21      6.8±0.04µs        ? ?/sec
in_list/Int32/list=3/nulls=20%                 1.05      8.7±0.07µs        ? ?/sec    1.00      8.3±0.11µs        ? ?/sec
in_list/Int32/list=8/nulls=0%                  1.00      6.0±0.04µs        ? ?/sec    1.09      6.5±0.16µs        ? ?/sec
in_list/Int32/list=8/nulls=20%                 1.04      8.4±0.10µs        ? ?/sec    1.00      8.1±0.04µs        ? ?/sec
in_list/Utf8/list=100/nulls=0%/str=100         1.00   645.1±11.45µs        ? ?/sec    1.00   645.7±12.49µs        ? ?/sec
in_list/Utf8/list=100/nulls=0%/str=12          1.00    667.7±6.12µs        ? ?/sec    1.03    685.0±3.93µs        ? ?/sec
in_list/Utf8/list=100/nulls=0%/str=3           1.00    669.3±4.02µs        ? ?/sec    1.06    707.1±7.55µs        ? ?/sec
in_list/Utf8/list=100/nulls=20%/str=100        1.00    594.9±3.67µs        ? ?/sec    1.02   604.8±12.27µs        ? ?/sec
in_list/Utf8/list=100/nulls=20%/str=12         1.00    613.4±6.72µs        ? ?/sec    1.02    627.5±5.92µs        ? ?/sec
in_list/Utf8/list=100/nulls=20%/str=3          1.00    615.7±2.78µs        ? ?/sec    1.02    627.1±3.23µs        ? ?/sec
in_list/Utf8/list=3/nulls=0%/str=100           1.01     27.3±0.97µs        ? ?/sec    1.00     27.0±1.15µs        ? ?/sec
in_list/Utf8/list=3/nulls=0%/str=12            1.06     27.4±0.34µs        ? ?/sec    1.00     25.9±0.40µs        ? ?/sec
in_list/Utf8/list=3/nulls=0%/str=3             1.00     28.3±1.19µs        ? ?/sec    1.04     29.4±3.04µs        ? ?/sec
in_list/Utf8/list=3/nulls=20%/str=100          1.02     26.2±0.26µs        ? ?/sec    1.00     25.8±0.51µs        ? ?/sec
in_list/Utf8/list=3/nulls=20%/str=12           1.00     27.1±0.10µs        ? ?/sec    1.05     28.5±1.12µs        ? ?/sec
in_list/Utf8/list=3/nulls=20%/str=3            1.04     27.7±0.75µs        ? ?/sec    1.00     26.5±0.55µs        ? ?/sec
in_list/Utf8/list=8/nulls=0%/str=100           1.00     56.7±1.89µs        ? ?/sec    1.00     56.4±0.99µs        ? ?/sec
in_list/Utf8/list=8/nulls=0%/str=12            1.00     57.8±2.60µs        ? ?/sec    1.02     59.2±0.56µs        ? ?/sec
in_list/Utf8/list=8/nulls=0%/str=3             1.00     58.7±0.24µs        ? ?/sec    1.09     64.3±6.59µs        ? ?/sec
in_list/Utf8/list=8/nulls=20%/str=100          1.03     53.4±1.32µs        ? ?/sec    1.00     51.9±0.10µs        ? ?/sec
in_list/Utf8/list=8/nulls=20%/str=12           1.00     54.2±1.31µs        ? ?/sec    1.00     54.5±0.69µs        ? ?/sec
in_list/Utf8/list=8/nulls=20%/str=3            1.01     54.6±0.20µs        ? ?/sec    1.00     53.8±0.53µs        ? ?/sec
in_list/Utf8View/list=100/nulls=0%/str=100     1.00    597.3±9.22µs        ? ?/sec    1.01   603.2±10.62µs        ? ?/sec
in_list/Utf8View/list=100/nulls=0%/str=12      1.00   581.4±24.63µs        ? ?/sec    1.00   581.1±13.15µs        ? ?/sec
in_list/Utf8View/list=100/nulls=0%/str=3       1.01   582.3±12.05µs        ? ?/sec    1.00    578.6±5.55µs        ? ?/sec
in_list/Utf8View/list=100/nulls=20%/str=100    1.00    503.3±5.49µs        ? ?/sec    1.02   515.1±27.03µs        ? ?/sec
in_list/Utf8View/list=100/nulls=20%/str=12     1.00    510.9±6.95µs        ? ?/sec    1.01    515.3±6.38µs        ? ?/sec
in_list/Utf8View/list=100/nulls=20%/str=3      1.00    515.1±7.18µs        ? ?/sec    1.01   522.1±13.63µs        ? ?/sec
in_list/Utf8View/list=3/nulls=0%/str=100       1.05     22.2±0.28µs        ? ?/sec    1.00     21.1±0.24µs        ? ?/sec
in_list/Utf8View/list=3/nulls=0%/str=12        1.05     23.3±0.62µs        ? ?/sec    1.00     22.3±0.30µs        ? ?/sec
in_list/Utf8View/list=3/nulls=0%/str=3         1.04     23.3±0.18µs        ? ?/sec    1.00     22.4±0.82µs        ? ?/sec
in_list/Utf8View/list=3/nulls=20%/str=100      1.00     22.8±0.22µs        ? ?/sec    1.02     23.2±0.24µs        ? ?/sec
in_list/Utf8View/list=3/nulls=20%/str=12       1.00     23.4±0.18µs        ? ?/sec    1.03     24.2±0.49µs        ? ?/sec
in_list/Utf8View/list=3/nulls=20%/str=3        1.00     23.7±0.37µs        ? ?/sec    1.02     24.2±1.51µs        ? ?/sec
in_list/Utf8View/list=8/nulls=0%/str=100       1.05     49.6±3.83µs        ? ?/sec    1.00     47.2±0.83µs        ? ?/sec
in_list/Utf8View/list=8/nulls=0%/str=12        1.00     50.5±0.65µs        ? ?/sec    1.18     59.7±1.62µs        ? ?/sec
in_list/Utf8View/list=8/nulls=0%/str=3         1.00     50.6±0.72µs        ? ?/sec    1.17     59.5±0.70µs        ? ?/sec
in_list/Utf8View/list=8/nulls=20%/str=100      1.00     45.4±2.06µs        ? ?/sec    1.06     48.2±1.59µs        ? ?/sec
in_list/Utf8View/list=8/nulls=20%/str=12       1.01     47.3±0.67µs        ? ?/sec    1.00     46.8±0.83µs        ? ?/sec
in_list/Utf8View/list=8/nulls=20%/str=3        1.01     49.1±0.36µs        ? ?/sec    1.00     48.6±1.29µs        ? ?/sec

Dec 09 '25 22:12 alamb-ghbot

@Dandandan is already on the list https://github.com/alamb/datafusion-benchmarking/blob/4fb120785fa66ecbf40a45d8a5d0d5f4be17266a/scripts/scrape_comments.py#L41

I can add @geoffreyclaude if he would like

@alamb yes please :) Especially for when/if I follow through with the additional perf improvements.

Dec 10 '25 14:12 geoffreyclaude

run benchmarks in_list

Dec 10 '25 14:12 adriangb

🤖 Hi @adriangb, thanks for the request (https://github.com/apache/datafusion/pull/18832#issuecomment-3637437969).

scrape_comments.py only supports whitelisted benchmarks.

Standard: clickbench_1, clickbench_extended, clickbench_partitioned, clickbench_pushdown, tpch, tpch10, tpch_mem, tpch_mem10
Criterion: aggregate_query_sql, aggregate_vectorized, case_when, in_list, sql_planner

Please choose one or more of these with run benchmark <name> or run benchmark <name1> <name2>...

Dec 10 '25 14:12 alamb-ghbot