Enable reading `StringViewArray` by default from Parquet
Draft as it builds on:
- [x] Debug tests
- [x] File issues found
- [x] https://github.com/apache/datafusion/pull/12724
- [x] https://github.com/apache/datafusion/pull/12792
- [ ] https://github.com/apache/datafusion/pull/12816 @goldmedal
- [x] https://github.com/apache/datafusion/pull/12809 from @Rachelint
- [x] Fix slowdown with querying
hits_partitioned
Which issue does this PR close?
Closes https://github.com/apache/datafusion/issues/11682
Rationale for this change
Reading data as StringViewArray is significantly faster than StringArray. We have been testing this behind a feature flag but it is now stable enough to enable by default.
See blog post https://github.com/apache/datafusion/issues/11603:
- https://www.influxdata.com/blog/faster-queries-with-stringview-part-one-influxdb/
- https://www.influxdata.com/blog/faster-queries-with-stringview-part-two-influxdb/
Benchmark Results
--------------------
Benchmark clickbench_1.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query ┃ main_base ┃ alamb_enable_string_view_by_def… ┃ Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 0 │ 0.66ms │ 0.67ms │ no change │
│ QQuery 1 │ 72.07ms │ 70.10ms │ no change │
│ QQuery 2 │ 119.92ms │ 124.84ms │ no change │
│ QQuery 3 │ 128.29ms │ 134.03ms │ no change │
│ QQuery 4 │ 964.03ms │ 974.14ms │ no change │
│ QQuery 5 │ 1088.23ms │ 1060.44ms │ no change │
│ QQuery 6 │ 66.53ms │ 64.39ms │ no change │
│ QQuery 7 │ 83.84ms │ 80.44ms │ no change │
│ QQuery 8 │ 1484.01ms │ 1466.81ms │ no change │
│ QQuery 9 │ 1366.17ms │ 1378.33ms │ no change │
│ QQuery 10 │ 465.55ms │ 362.07ms │ +1.29x faster │
│ QQuery 11 │ 514.19ms │ 398.94ms │ +1.29x faster │
│ QQuery 12 │ 1195.47ms │ 1098.46ms │ +1.09x faster │
│ QQuery 13 │ 2183.91ms │ 2083.70ms │ no change │
│ QQuery 14 │ 1641.36ms │ 1551.03ms │ +1.06x faster │
│ QQuery 15 │ 1126.56ms │ 1132.80ms │ no change │
│ QQuery 16 │ 3050.64ms │ 3088.31ms │ no change │
│ QQuery 17 │ 2746.65ms │ 2791.27ms │ no change │
│ QQuery 18 │ 5845.46ms │ 6064.71ms │ no change │
│ QQuery 19 │ 121.17ms │ 123.79ms │ no change │
│ QQuery 20 │ 1628.64ms │ 1387.74ms │ +1.17x faster │
│ QQuery 21 │ 2048.94ms │ 1738.74ms │ +1.18x faster │
│ QQuery 22 │ 4994.39ms │ 4252.83ms │ +1.17x faster │
│ QQuery 23 │ 11893.20ms │ 9996.52ms │ +1.19x faster │
│ QQuery 24 │ 775.43ms │ 669.76ms │ +1.16x faster │
│ QQuery 25 │ 684.66ms │ 601.04ms │ +1.14x faster │
│ QQuery 26 │ 873.53ms │ 738.14ms │ +1.18x faster │
│ QQuery 27 │ 2576.02ms │ 2098.18ms │ +1.23x faster │
│ QQuery 28 │ 15637.06ms │ 14166.05ms │ +1.10x faster │
│ QQuery 29 │ 565.58ms │ 551.64ms │ no change │
│ QQuery 30 │ 1305.23ms │ 1209.25ms │ +1.08x faster │
│ QQuery 31 │ 1389.60ms │ 1265.74ms │ +1.10x faster │
│ QQuery 32 │ 4741.08ms │ 4853.34ms │ no change │
│ QQuery 33 │ 5306.91ms │ 4193.22ms │ +1.27x faster │
│ QQuery 34 │ 5167.11ms │ 4222.33ms │ +1.22x faster │
│ QQuery 35 │ 1863.80ms │ 1859.61ms │ no change │
│ QQuery 36 │ 317.80ms │ 291.40ms │ +1.09x faster │
│ QQuery 37 │ 216.31ms │ 184.00ms │ +1.18x faster │
│ QQuery 38 │ 200.83ms │ 182.21ms │ +1.10x faster │
│ QQuery 39 │ 1038.73ms │ 851.30ms │ +1.22x faster │
│ QQuery 40 │ 86.40ms │ 88.35ms │ no change │
│ QQuery 41 │ 80.99ms │ 78.34ms │ no change │
│ QQuery 42 │ 93.00ms │ 97.02ms │ no change │
└──────────────┴────────────┴──────────────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary ┃ ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (main_base) │ 87749.93ms │
│ Total Time (alamb_enable_string_view_by_default) │ 79626.03ms │
│ Average Time (main_base) │ 2040.70ms │
│ Average Time (alamb_enable_string_view_by_default) │ 1851.77ms │
│ Queries Faster │ 21 │
│ Queries Slower │ 0 │
│ Queries with No Change │ 22 │
└────────────────────────────────────────────────────┴────────────┘
--------------------
Benchmark clickbench_partitioned.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query ┃ main_base ┃ alamb_enable_string_view_by_def… ┃ Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 0 │ 2.37ms │ 1.97ms │ +1.21x faster │
│ QQuery 1 │ 38.68ms │ 36.78ms │ no change │
│ QQuery 2 │ 95.07ms │ 94.11ms │ no change │
│ QQuery 3 │ 98.75ms │ 100.86ms │ no change │
│ QQuery 4 │ 927.07ms │ 931.67ms │ no change │
│ QQuery 5 │ 964.71ms │ 1043.29ms │ 1.08x slower │
│ QQuery 6 │ 36.40ms │ 33.21ms │ +1.10x faster │
│ QQuery 7 │ 42.34ms │ 40.83ms │ no change │
│ QQuery 8 │ 1440.57ms │ 1434.71ms │ no change │
│ QQuery 9 │ 1343.13ms │ 1345.13ms │ no change │
│ QQuery 10 │ 357.26ms │ 432.73ms │ 1.21x slower │
│ QQuery 11 │ 404.40ms │ 478.21ms │ 1.18x slower │
│ QQuery 12 │ 1094.88ms │ 1057.96ms │ no change │
│ QQuery 13 │ 1829.88ms │ 2013.06ms │ 1.10x slower │
│ QQuery 14 │ 1519.03ms │ 1516.29ms │ no change │
│ QQuery 15 │ 1085.44ms │ 1068.30ms │ no change │
│ QQuery 16 │ 2905.66ms │ 3086.20ms │ 1.06x slower │
│ QQuery 17 │ 2744.69ms │ 2810.55ms │ no change │
│ QQuery 18 │ 5845.13ms │ 6043.80ms │ no change │
│ QQuery 19 │ 95.15ms │ 89.50ms │ +1.06x faster │
│ QQuery 20 │ 1751.84ms │ 2165.42ms │ 1.24x slower │
│ QQuery 21 │ 2019.88ms │ 2379.20ms │ 1.18x slower │
│ QQuery 22 │ 5167.42ms │ 5642.45ms │ 1.09x slower │
│ QQuery 23 │ 10400.41ms │ 14124.07ms │ 1.36x slower │
│ QQuery 24 │ 584.30ms │ 656.67ms │ 1.12x slower │
│ QQuery 25 │ 498.02ms │ 519.50ms │ no change │
│ QQuery 26 │ 651.84ms │ 719.41ms │ 1.10x slower │
│ QQuery 27 │ 2525.75ms │ 2694.77ms │ 1.07x slower │
│ QQuery 28 │ 14905.83ms │ 26662.02ms │ 1.79x slower │
│ QQuery 29 │ 529.85ms │ 524.36ms │ no change │
│ QQuery 30 │ 1086.70ms │ 1212.89ms │ 1.12x slower │
│ QQuery 31 │ 1163.36ms │ 1291.25ms │ 1.11x slower │
│ QQuery 32 │ 4764.64ms │ 4782.92ms │ no change │
│ QQuery 33 │ 5149.48ms │ 4480.69ms │ +1.15x faster │
│ QQuery 34 │ 5162.88ms │ 4481.70ms │ +1.15x faster │
│ QQuery 35 │ 1811.72ms │ 1812.12ms │ no change │
│ QQuery 36 │ 282.31ms │ 268.12ms │ +1.05x faster │
│ QQuery 37 │ 121.31ms │ 104.96ms │ +1.16x faster │
│ QQuery 38 │ 147.57ms │ 176.74ms │ 1.20x slower │
│ QQuery 39 │ 975.45ms │ 848.76ms │ +1.15x faster │
│ QQuery 40 │ 61.35ms │ 57.33ms │ +1.07x faster │
│ QQuery 41 │ 50.15ms │ 47.45ms │ +1.06x faster │
│ QQuery 42 │ 63.67ms │ 62.35ms │ no change │
└──────────────┴────────────┴──────────────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary ┃ ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (main_base) │ 82746.33ms │
│ Total Time (alamb_enable_string_view_by_default) │ 99374.32ms │
│ Average Time (main_base) │ 1924.33ms │
│ Average Time (alamb_enable_string_view_by_default) │ 2311.03ms │
│ Queries Faster │ 10 │
│ Queries Slower │ 16 │
│ Queries with No Change │ 17 │
└────────────────────────────────────────────────────┴────────────┘
What changes are included in this PR?
- Set
schema_force_view_typesto true
Are these changes tested?
Yes, by CI tests
Are there any user-facing changes?
- Faster reading of data from Parquet files
If you see an error related to StringView use, you can disable this feature using the schema_force_string_view option
> set datafusion.execution.parquet.schema_force_view_types = false;
0 row(s) fetched.
Elapsed 0.000 seconds.
Context
@XiangpengHao debugged these tests previously using https://github.com/apache/datafusion/pull/11862
Arrow-rs skips now the interval parts with 0? so interval tests are broken now
That is due to the arrow upgrade for sure -- you can see the changes needed here (in their own PR): https://github.com/apache/datafusion/pull/12032
I think the last remaining failure is due to https://github.com/apache/datafusion/issues/12123
Still working on benchmarks, but the code is looking good
Annoyingly, when I ran the benchmarks now I get an error like this:
Q20: SELECT COUNT(*) FROM hits WHERE "URL" LIKE '%google%';
Error: Context("type_coercion", Plan("There isn't a common type to coerce BinaryView and Utf8 in LIKE expression"))
I will figure it out
Update -- this happens in hits_partitioned where the types are inferred like this (aka binary fields rather than utf8)
> describe hits_partitioned
;
+-----------------------+-----------+-------------+
| column_name | data_type | is_nullable |
+-----------------------+-----------+-------------+
| WatchID | Int64 | YES |
| JavaEnable | Int16 | YES |
| Title | Binary | YES |
| GoodEvent | Int16 | YES |
| EventTime | Int64 | YES |
| EventDate | UInt16 | YES |
| CounterID | Int32 | YES |
| ClientIP | Int32 | YES |
| RegionID | Int32 | YES |
| UserID | Int64 | YES |
| CounterClass | Int16 | YES |
| OS | Int16 | YES |
| UserAgent | Int16 | YES |
| URL | Binary | YES |
| Referer | Binary | YES |
| IsRefresh | Int16 | YES |
| RefererCategoryID | Int16 | YES |
| RefererRegionID | Int32 | YES |
| URLCategoryID | Int16 | YES |
| URLRegionID | Int32 | YES |
| ResolutionWidth | Int16 | YES |
| ResolutionHeight | Int16 | YES |
| ResolutionDepth | Int16 | YES |
| FlashMajor | Int16 | YES |
| FlashMinor | Int16 | YES |
| FlashMinor2 | Binary | YES |
| NetMajor | Int16 | YES |
| NetMinor | Int16 | YES |
| UserAgentMajor | Int16 | YES |
| UserAgentMinor | Binary | YES |
| CookieEnable | Int16 | YES |
| JavascriptEnable | Int16 | YES |
| IsMobile | Int16 | YES |
| MobilePhone | Int16 | YES |
| MobilePhoneModel | Binary | YES |
| Params | Binary | YES |
| IPNetworkID | Int32 | YES |
| TraficSourceID | Int16 | YES |
| SearchEngineID | Int16 | YES |
| SearchPhrase | Binary | YES |
| AdvEngineID | Int16 | YES |
| IsArtifical | Int16 | YES |
| WindowClientWidth | Int16 | YES |
| WindowClientHeight | Int16 | YES |
| ClientTimeZone | Int16 | YES |
| ClientEventTime | Int64 | YES |
| SilverlightVersion1 | Int16 | YES |
| SilverlightVersion2 | Int16 | YES |
| SilverlightVersion3 | Int32 | YES |
| SilverlightVersion4 | Int16 | YES |
| PageCharset | Binary | YES |
| CodeVersion | Int32 | YES |
| IsLink | Int16 | YES |
| IsDownload | Int16 | YES |
| IsNotBounce | Int16 | YES |
| FUniqID | Int64 | YES |
| OriginalURL | Binary | YES |
| HID | Int32 | YES |
| IsOldCounter | Int16 | YES |
| IsEvent | Int16 | YES |
| IsParameter | Int16 | YES |
| DontCountHits | Int16 | YES |
| WithHash | Int16 | YES |
| HitColor | Binary | YES |
| LocalEventTime | Int64 | YES |
| Age | Int16 | YES |
| Sex | Int16 | YES |
| Income | Int16 | YES |
| Interests | Int16 | YES |
| Robotness | Int16 | YES |
| RemoteIP | Int32 | YES |
| WindowName | Int32 | YES |
| OpenerName | Int32 | YES |
| HistoryLength | Int16 | YES |
| BrowserLanguage | Binary | YES |
| BrowserCountry | Binary | YES |
| SocialNetwork | Binary | YES |
| SocialAction | Binary | YES |
| HTTPError | Int16 | YES |
| SendTiming | Int32 | YES |
| DNSTiming | Int32 | YES |
| ConnectTiming | Int32 | YES |
| ResponseStartTiming | Int32 | YES |
| ResponseEndTiming | Int32 | YES |
| FetchTiming | Int32 | YES |
| SocialSourceNetworkID | Int16 | YES |
| SocialSourcePage | Binary | YES |
| ParamPrice | Int64 | YES |
| ParamOrderID | Binary | YES |
| ParamCurrency | Binary | YES |
| ParamCurrencyID | Int16 | YES |
| OpenstatServiceName | Binary | YES |
| OpenstatCampaignID | Binary | YES |
| OpenstatAdID | Binary | YES |
| OpenstatSourceID | Binary | YES |
| UTMSource | Binary | YES |
| UTMMedium | Binary | YES |
| UTMCampaign | Binary | YES |
| UTMContent | Binary | YES |
| UTMTerm | Binary | YES |
| FromTag | Binary | YES |
| HasGCLID | Int16 | YES |
| RefererHash | Int64 | YES |
| URLHash | Int64 | YES |
| CLID | Int32 | YES |
+-----------------------+-----------+-------------+
105 row(s) fetched.
Elapsed 0.037 seconds.
I ran benchmarks and posted results, -- clickbench looks great 🚀 . However, interesting clickbench_partitioned got slower. I am pretty sure I know why (because the schema is resolved to BinaryView rather than Binary). Will review that later
I figured out what is going on (different than I thought). I believe StringView::slice() is quite a bit slower than StringArray::slice due to the fact it has a buffers field
The query is
SELECT REGEXP_REPLACE("Referer", '^https?://(?:www\\.)?([^/]+)/.*$', '\\1') AS k, AVG(length("Referer")) AS l, COUNT(*) AS c, MIN("Referer")
FROM hits_partitioned
WHERE "Referer" <> '' GROUP BY k HAVING COUNT(*) > 100000 ORDER BY l DESC LIMIT 25;
Some flamegraphs:
I will think about the best way to proceed here
I plan to take a closer look at this on Saturday if no one has beat me to it, cc @alamb
I plan to take a closer look at this on Saturday if no one has beat me to it, cc @alamb
Thanks @XiangpengHao -- I
- Filed https://github.com/apache/arrow-rs/issues/6408 to track the slicing thing upstream
- Updated https://github.com/apache/datafusion/issues/6906 with rationale and a potential design to improve the performance
Found and filed another slowdown
- https://github.com/apache/datafusion/issues/12509
Here is a proposal I think that could potentially fix the regression: https://github.com/apache/datafusion/issues/12509#issuecomment-2366775581 (basically push the casting down into ParquetExec). 🤔
Benchmarks results
clickbench_1:
Is slower due to https://github.com/apache/datafusion/issues/12771
--------------------
Benchmark clickbench_1.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query ┃ main_base ┃ alamb_enable_string_view_by_def… ┃ Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 0 │ 0.66ms │ 0.69ms │ no change │
│ QQuery 1 │ 71.33ms │ 75.13ms │ 1.05x slower │
│ QQuery 2 │ 131.03ms │ 127.11ms │ no change │
│ QQuery 3 │ 126.47ms │ 130.56ms │ no change │
│ QQuery 4 │ 980.91ms │ 979.41ms │ no change │
│ QQuery 5 │ 1039.82ms │ 1074.08ms │ no change │
│ QQuery 6 │ 63.36ms │ 65.10ms │ no change │
│ QQuery 7 │ 81.92ms │ 81.78ms │ no change │
│ QQuery 8 │ 1349.24ms │ 1307.04ms │ no change │
│ QQuery 9 │ 1385.82ms │ 1389.54ms │ no change │
│ QQuery 10 │ 451.48ms │ 343.72ms │ +1.31x faster │
│ QQuery 11 │ 490.86ms │ 399.73ms │ +1.23x faster │
│ QQuery 12 │ 1185.68ms │ 1105.26ms │ +1.07x faster │
│ QQuery 13 │ 1807.96ms │ 2124.67ms │ 1.18x slower │
│ QQuery 14 │ 1316.22ms │ 1544.26ms │ 1.17x slower │
│ QQuery 15 │ 1134.52ms │ 1127.19ms │ no change │
│ QQuery 16 │ 2585.31ms │ 3063.20ms │ 1.18x slower │
│ QQuery 17 │ 2377.87ms │ 2806.52ms │ 1.18x slower │
│ QQuery 18 │ 5039.50ms │ 5931.83ms │ 1.18x slower │
│ QQuery 19 │ 123.28ms │ 118.74ms │ no change │
│ QQuery 20 │ 1620.67ms │ 1387.02ms │ +1.17x faster │
│ QQuery 21 │ 2066.28ms │ 1716.94ms │ +1.20x faster │
│ QQuery 22 │ 4534.60ms │ 4257.65ms │ +1.07x faster │
│ QQuery 23 │ 11757.43ms │ 10161.49ms │ +1.16x faster │
│ QQuery 24 │ 751.56ms │ 677.31ms │ +1.11x faster │
│ QQuery 25 │ 661.76ms │ 583.94ms │ +1.13x faster │
│ QQuery 26 │ 827.91ms │ 778.19ms │ +1.06x faster │
│ QQuery 27 │ 2542.23ms │ 2090.62ms │ +1.22x faster │
│ QQuery 28 │ 15396.30ms │ 14334.92ms │ +1.07x faster │
│ QQuery 29 │ 570.84ms │ 572.35ms │ no change │
│ QQuery 30 │ 1188.89ms │ 1142.40ms │ no change │
│ QQuery 31 │ 1243.82ms │ 1173.04ms │ +1.06x faster │
│ QQuery 32 │ 4336.24ms │ 4174.83ms │ no change │
│ QQuery 33 │ 5266.84ms │ 4255.56ms │ +1.24x faster │
│ QQuery 34 │ 5350.35ms │ 4244.46ms │ +1.26x faster │
│ QQuery 35 │ 1781.95ms │ 1788.24ms │ no change │
│ QQuery 36 │ 319.41ms │ 284.84ms │ +1.12x faster │
│ QQuery 37 │ 207.99ms │ 183.65ms │ +1.13x faster │
│ QQuery 38 │ 194.13ms │ 184.99ms │ no change │
│ QQuery 39 │ 768.26ms │ 816.16ms │ 1.06x slower │
│ QQuery 40 │ 84.86ms │ 85.84ms │ no change │
│ QQuery 41 │ 84.45ms │ 79.66ms │ +1.06x faster │
│ QQuery 42 │ 96.44ms │ 97.44ms │ no change │
└──────────────┴────────────┴──────────────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary ┃ ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (main_base) │ 83396.44ms │
│ Total Time (alamb_enable_string_view_by_default) │ 78867.11ms │
│ Average Time (main_base) │ 1939.45ms │
│ Average Time (alamb_enable_string_view_by_default) │ 1834.12ms │
│ Queries Faster │ 18 │
│ Queries Slower │ 7 │
│ Queries with No Change │ 18 │
└────────────────────────────────────────────────────┴────────────┘
--------------------
clickbench_extended
Looking good here
--------------------
Benchmark clickbench_extended.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query ┃ main_base ┃ alamb_enable_string_view_by_def… ┃ Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 0 │ 2741.98ms │ 2616.63ms │ no change │
│ QQuery 1 │ 759.57ms │ 681.31ms │ +1.11x faster │
│ QQuery 2 │ 1625.61ms │ 1398.16ms │ +1.16x faster │
│ QQuery 3 │ 680.35ms │ 683.42ms │ no change │
│ QQuery 4 │ 12760.38ms │ 12498.84ms │ no change │
│ QQuery 5 │ 19106.95ms │ 19157.16ms │ no change │
└──────────────┴────────────┴──────────────────────────────────┴───────────────┘
clickbench_partitioned
Slowing down also due to https://github.com/apache/datafusion/issues/12509 / https://github.com/apache/datafusion/issues/12788
--------------------
Benchmark clickbench_partitioned.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query ┃ main_base ┃ alamb_enable_string_view_by_def… ┃ Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 0 │ 2.20ms │ 2.23ms │ no change │
│ QQuery 1 │ 40.39ms │ 40.67ms │ no change │
│ QQuery 2 │ 96.55ms │ 98.55ms │ no change │
│ QQuery 3 │ 99.47ms │ 99.17ms │ no change │
│ QQuery 4 │ 924.46ms │ 922.44ms │ no change │
│ QQuery 5 │ 950.04ms │ 968.54ms │ no change │
│ QQuery 6 │ 34.09ms │ 33.63ms │ no change │
│ QQuery 7 │ 43.25ms │ 43.24ms │ no change │
│ QQuery 8 │ 1366.70ms │ 1350.65ms │ no change │
│ QQuery 9 │ 1328.80ms │ 1344.05ms │ no change │
│ QQuery 10 │ 348.25ms │ 314.41ms │ +1.11x faster │
│ QQuery 11 │ 395.56ms │ 360.36ms │ +1.10x faster │
│ QQuery 12 │ 1096.67ms │ 968.45ms │ +1.13x faster │
│ QQuery 13 │ 1749.70ms │ 1747.58ms │ no change │
│ QQuery 14 │ 1209.17ms │ 1406.79ms │ 1.16x slower │
│ QQuery 15 │ 1085.42ms │ 1077.92ms │ no change │
│ QQuery 16 │ 2491.88ms │ 2904.94ms │ 1.17x slower │
│ QQuery 17 │ 2297.53ms │ 2705.51ms │ 1.18x slower │
│ QQuery 18 │ 5009.23ms │ 5787.71ms │ 1.16x slower │
│ QQuery 19 │ 94.74ms │ 90.64ms │ no change │
│ QQuery 20 │ 1697.79ms │ 1821.09ms │ 1.07x slower │
│ QQuery 21 │ 1997.89ms │ 1907.50ms │ no change │
│ QQuery 22 │ 4709.43ms │ 4818.95ms │ no change │
│ QQuery 23 │ 10513.97ms │ 10145.30ms │ no change │
│ QQuery 24 │ 583.88ms │ 486.20ms │ +1.20x faster │
│ QQuery 25 │ 488.55ms │ 408.51ms │ +1.20x faster │
│ QQuery 26 │ 657.96ms │ 566.02ms │ +1.16x faster │
│ QQuery 27 │ 2529.00ms │ 2311.10ms │ +1.09x faster │
│ QQuery 28 │ 14698.76ms │ 26994.08ms │ 1.84x slower │
│ QQuery 29 │ 521.52ms │ 544.05ms │ no change │
│ QQuery 30 │ 1043.00ms │ 1014.79ms │ no change │
│ QQuery 31 │ 1111.27ms │ 1057.91ms │ no change │
│ QQuery 32 │ 4308.74ms │ 4254.27ms │ no change │
│ QQuery 33 │ 5276.47ms │ 4134.58ms │ +1.28x faster │
│ QQuery 34 │ 5342.06ms │ 4150.13ms │ +1.29x faster │
│ QQuery 35 │ 1905.15ms │ 1857.24ms │ no change │
│ QQuery 36 │ 276.11ms │ 238.65ms │ +1.16x faster │
│ QQuery 37 │ 120.85ms │ 87.17ms │ +1.39x faster │
│ QQuery 38 │ 144.61ms │ 152.75ms │ 1.06x slower │
│ QQuery 39 │ 739.13ms │ 759.31ms │ no change │
│ QQuery 40 │ 56.56ms │ 55.03ms │ no change │
│ QQuery 41 │ 48.43ms │ 50.81ms │ no change │
│ QQuery 42 │ 63.36ms │ 63.27ms │ no change │
└──────────────┴────────────┴──────────────────────────────────┴───────────────┘
Current status: https://github.com/apache/datafusion/issues/11682#issuecomment-2396551895
My plan is to pull the changes from the following PRs into this PR and rerun the overall perf test
- [x] https://github.com/apache/datafusion/pull/12792
- [ ] https://github.com/apache/datafusion/pull/12816 @goldmedal
- [x] https://github.com/apache/datafusion/pull/12809 from @Rachelint
Hopefully that will show the performance we need. Then we just need to get the various PRs actually merged and we can do this 😅
Here are the current performance results: 🚀 Basically 10% faster across all queries and no slow downs
👏 @Rachelint @jayzhan211 @XiangpengHao and many many others.
Benchmark clickbench_1.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query ┃ main_base ┃ alamb_enable_string_view_by_def… ┃ Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 0 │ 0.70ms │ 0.67ms │ no change │
│ QQuery 1 │ 75.35ms │ 69.82ms │ +1.08x faster │
│ QQuery 2 │ 123.15ms │ 128.37ms │ no change │
│ QQuery 3 │ 129.02ms │ 128.14ms │ no change │
│ QQuery 4 │ 947.97ms │ 967.71ms │ no change │
│ QQuery 5 │ 1086.19ms │ 1060.35ms │ no change │
│ QQuery 6 │ 65.49ms │ 66.43ms │ no change │
│ QQuery 7 │ 80.72ms │ 84.38ms │ no change │
│ QQuery 8 │ 1330.08ms │ 1332.41ms │ no change │
│ QQuery 9 │ 1345.21ms │ 1344.89ms │ no change │
│ QQuery 10 │ 443.35ms │ 335.50ms │ +1.32x faster │
│ QQuery 11 │ 493.80ms │ 372.10ms │ +1.33x faster │
│ QQuery 12 │ 1205.69ms │ 1101.01ms │ +1.10x faster │
│ QQuery 13 │ 1921.43ms │ 1617.54ms │ +1.19x faster │
│ QQuery 14 │ 1355.42ms │ 1146.40ms │ +1.18x faster │
│ QQuery 15 │ 1106.21ms │ 1129.07ms │ no change │
│ QQuery 16 │ 2546.11ms │ 2434.61ms │ no change │
│ QQuery 17 │ 2358.85ms │ 2269.62ms │ no change │
│ QQuery 18 │ 4944.14ms │ 5101.45ms │ no change │
│ QQuery 19 │ 121.48ms │ 124.59ms │ no change │
│ QQuery 20 │ 1637.28ms │ 1377.14ms │ +1.19x faster │
│ QQuery 21 │ 2106.73ms │ 1716.14ms │ +1.23x faster │
│ QQuery 22 │ 5050.72ms │ 4231.71ms │ +1.19x faster │
│ QQuery 23 │ 12149.30ms │ 9998.89ms │ +1.22x faster │
│ QQuery 24 │ 793.52ms │ 683.81ms │ +1.16x faster │
│ QQuery 25 │ 700.98ms │ 587.52ms │ +1.19x faster │
│ QQuery 26 │ 860.57ms │ 725.23ms │ +1.19x faster │
│ QQuery 27 │ 2540.34ms │ 2104.51ms │ +1.21x faster │
│ QQuery 28 │ 14015.68ms │ 14587.90ms │ no change │
│ QQuery 29 │ 571.20ms │ 554.32ms │ no change │
│ QQuery 30 │ 1239.94ms │ 1118.25ms │ +1.11x faster │
│ QQuery 31 │ 1276.58ms │ 1189.58ms │ +1.07x faster │
│ QQuery 32 │ 4230.09ms │ 4204.57ms │ no change │
│ QQuery 33 │ 5223.33ms │ 4283.46ms │ +1.22x faster │
│ QQuery 34 │ 5328.12ms │ 4264.81ms │ +1.25x faster │
│ QQuery 35 │ 1754.78ms │ 1797.85ms │ no change │
│ QQuery 36 │ 306.94ms │ 287.75ms │ +1.07x faster │
│ QQuery 37 │ 220.70ms │ 182.40ms │ +1.21x faster │
│ QQuery 38 │ 192.66ms │ 188.30ms │ no change │
│ QQuery 39 │ 790.63ms │ 551.89ms │ +1.43x faster │
│ QQuery 40 │ 86.56ms │ 84.86ms │ no change │
│ QQuery 41 │ 83.08ms │ 77.75ms │ +1.07x faster │
│ QQuery 42 │ 93.76ms │ 94.03ms │ no change │
└──────────────┴────────────┴──────────────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary ┃ ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (main_base) │ 82933.83ms │
│ Total Time (alamb_enable_string_view_by_default) │ 75707.73ms │
│ Average Time (main_base) │ 1928.69ms │
│ Average Time (alamb_enable_string_view_by_default) │ 1760.64ms │
│ Queries Faster │ 22 │
│ Queries Slower │ 0 │
│ Queries with No Change │ 21 │
└────────────────────────────────────────────────────┴────────────┘
Also for partitioned:
--------------------
Benchmark clickbench_partitioned.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query ┃ main_base ┃ alamb_enable_string_view_by_def… ┃ Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 0 │ 2.24ms │ 2.22ms │ no change │
│ QQuery 1 │ 37.96ms │ 38.15ms │ no change │
│ QQuery 2 │ 95.79ms │ 94.43ms │ no change │
│ QQuery 3 │ 102.41ms │ 100.80ms │ no change │
│ QQuery 4 │ 927.15ms │ 930.56ms │ no change │
│ QQuery 5 │ 973.96ms │ 951.87ms │ no change │
│ QQuery 6 │ 34.33ms │ 33.58ms │ no change │
│ QQuery 7 │ 42.82ms │ 41.29ms │ no change │
│ QQuery 8 │ 1364.76ms │ 1344.57ms │ no change │
│ QQuery 9 │ 1309.61ms │ 1365.89ms │ no change │
│ QQuery 10 │ 346.30ms │ 307.70ms │ +1.13x faster │
│ QQuery 11 │ 394.60ms │ 347.22ms │ +1.14x faster │
│ QQuery 12 │ 1097.07ms │ 1007.94ms │ +1.09x faster │
│ QQuery 13 │ 1636.51ms │ 1602.93ms │ no change │
│ QQuery 14 │ 1197.28ms │ 1069.49ms │ +1.12x faster │
│ QQuery 15 │ 1068.82ms │ 1077.73ms │ no change │
│ QQuery 16 │ 2485.04ms │ 2398.93ms │ no change │
│ QQuery 17 │ 2275.37ms │ 2212.18ms │ no change │
│ QQuery 18 │ 4952.47ms │ 5114.44ms │ no change │
│ QQuery 19 │ 96.01ms │ 92.62ms │ no change │
│ QQuery 20 │ 1719.75ms │ 1254.03ms │ +1.37x faster │
│ QQuery 21 │ 2001.12ms │ 1481.53ms │ +1.35x faster │
│ QQuery 22 │ 5169.50ms │ 2649.07ms │ +1.95x faster │
│ QQuery 23 │ 10586.32ms │ 8619.20ms │ +1.23x faster │
│ QQuery 24 │ 594.74ms │ 517.28ms │ +1.15x faster │
│ QQuery 25 │ 495.22ms │ 427.98ms │ +1.16x faster │
│ QQuery 26 │ 660.64ms │ 565.71ms │ +1.17x faster │
│ QQuery 27 │ 2557.47ms │ 1846.03ms │ +1.39x faster │
│ QQuery 28 │ 13399.41ms │ 13583.18ms │ no change │
│ QQuery 29 │ 535.81ms │ 524.10ms │ no change │
│ QQuery 30 │ 1034.30ms │ 964.20ms │ +1.07x faster │
│ QQuery 31 │ 1101.14ms │ 1035.81ms │ +1.06x faster │
│ QQuery 32 │ 4202.55ms │ 4049.53ms │ no change │
│ QQuery 33 │ 5136.86ms │ 3916.87ms │ +1.31x faster │
│ QQuery 34 │ 5104.18ms │ 3932.13ms │ +1.30x faster │
│ QQuery 35 │ 1909.94ms │ 1797.84ms │ +1.06x faster │
│ QQuery 36 │ 264.07ms │ 225.04ms │ +1.17x faster │
│ QQuery 37 │ 121.76ms │ 92.57ms │ +1.32x faster │
│ QQuery 38 │ 147.92ms │ 131.16ms │ +1.13x faster │
│ QQuery 39 │ 749.30ms │ 477.60ms │ +1.57x faster │
│ QQuery 40 │ 56.82ms │ 55.17ms │ no change │
│ QQuery 41 │ 47.54ms │ 45.12ms │ +1.05x faster │
│ QQuery 42 │ 63.14ms │ 64.81ms │ no change │
└──────────────┴────────────┴──────────────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary ┃ ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (main_base) │ 78099.98ms │
│ Total Time (alamb_enable_string_view_by_default) │ 68390.51ms │
│ Average Time (main_base) │ 1816.28ms │
│ Average Time (alamb_enable_string_view_by_default) │ 1590.48ms │
│ Queries Faster │ 22 │
│ Queries Slower │ 0 │
│ Queries with No Change │ 21 │
└────────────────────────────────────────────────────┴────────────┘
Next steps are to get the various PRs merged, get this one ready, and then write all about it.
I made a new PR as this one has lots of now irrelevant historical context
New PR here: https://github.com/apache/datafusion/pull/13101