datafusion icon indicating copy to clipboard operation
datafusion copied to clipboard

Enable reading `StringViewArray` by default from Parquet

Open alamb opened this issue 1 year ago • 17 comments

Draft as it builds on:

  • [x] Debug tests
  • [x] File issues found
  • [x] https://github.com/apache/datafusion/pull/12724
  • [x] https://github.com/apache/datafusion/pull/12792
  • [ ] https://github.com/apache/datafusion/pull/12816 @goldmedal
  • [x] https://github.com/apache/datafusion/pull/12809 from @Rachelint
  • [x] Fix slowdown with querying hits_partitioned

Which issue does this PR close?

Closes https://github.com/apache/datafusion/issues/11682

Rationale for this change

Reading data as StringViewArray is significantly faster than StringArray. We have been testing this behind a feature flag but it is now stable enough to enable by default.

See blog post https://github.com/apache/datafusion/issues/11603:

  • https://www.influxdata.com/blog/faster-queries-with-stringview-part-one-influxdb/
  • https://www.influxdata.com/blog/faster-queries-with-stringview-part-two-influxdb/

Benchmark Results

--------------------
Benchmark clickbench_1.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃  main_base ┃ alamb_enable_string_view_by_def… ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 0     │     0.66ms │                           0.67ms │     no change │
│ QQuery 1     │    72.07ms │                          70.10ms │     no change │
│ QQuery 2     │   119.92ms │                         124.84ms │     no change │
│ QQuery 3     │   128.29ms │                         134.03ms │     no change │
│ QQuery 4     │   964.03ms │                         974.14ms │     no change │
│ QQuery 5     │  1088.23ms │                        1060.44ms │     no change │
│ QQuery 6     │    66.53ms │                          64.39ms │     no change │
│ QQuery 7     │    83.84ms │                          80.44ms │     no change │
│ QQuery 8     │  1484.01ms │                        1466.81ms │     no change │
│ QQuery 9     │  1366.17ms │                        1378.33ms │     no change │
│ QQuery 10    │   465.55ms │                         362.07ms │ +1.29x faster │
│ QQuery 11    │   514.19ms │                         398.94ms │ +1.29x faster │
│ QQuery 12    │  1195.47ms │                        1098.46ms │ +1.09x faster │
│ QQuery 13    │  2183.91ms │                        2083.70ms │     no change │
│ QQuery 14    │  1641.36ms │                        1551.03ms │ +1.06x faster │
│ QQuery 15    │  1126.56ms │                        1132.80ms │     no change │
│ QQuery 16    │  3050.64ms │                        3088.31ms │     no change │
│ QQuery 17    │  2746.65ms │                        2791.27ms │     no change │
│ QQuery 18    │  5845.46ms │                        6064.71ms │     no change │
│ QQuery 19    │   121.17ms │                         123.79ms │     no change │
│ QQuery 20    │  1628.64ms │                        1387.74ms │ +1.17x faster │
│ QQuery 21    │  2048.94ms │                        1738.74ms │ +1.18x faster │
│ QQuery 22    │  4994.39ms │                        4252.83ms │ +1.17x faster │
│ QQuery 23    │ 11893.20ms │                        9996.52ms │ +1.19x faster │
│ QQuery 24    │   775.43ms │                         669.76ms │ +1.16x faster │
│ QQuery 25    │   684.66ms │                         601.04ms │ +1.14x faster │
│ QQuery 26    │   873.53ms │                         738.14ms │ +1.18x faster │
│ QQuery 27    │  2576.02ms │                        2098.18ms │ +1.23x faster │
│ QQuery 28    │ 15637.06ms │                       14166.05ms │ +1.10x faster │
│ QQuery 29    │   565.58ms │                         551.64ms │     no change │
│ QQuery 30    │  1305.23ms │                        1209.25ms │ +1.08x faster │
│ QQuery 31    │  1389.60ms │                        1265.74ms │ +1.10x faster │
│ QQuery 32    │  4741.08ms │                        4853.34ms │     no change │
│ QQuery 33    │  5306.91ms │                        4193.22ms │ +1.27x faster │
│ QQuery 34    │  5167.11ms │                        4222.33ms │ +1.22x faster │
│ QQuery 35    │  1863.80ms │                        1859.61ms │     no change │
│ QQuery 36    │   317.80ms │                         291.40ms │ +1.09x faster │
│ QQuery 37    │   216.31ms │                         184.00ms │ +1.18x faster │
│ QQuery 38    │   200.83ms │                         182.21ms │ +1.10x faster │
│ QQuery 39    │  1038.73ms │                         851.30ms │ +1.22x faster │
│ QQuery 40    │    86.40ms │                          88.35ms │     no change │
│ QQuery 41    │    80.99ms │                          78.34ms │     no change │
│ QQuery 42    │    93.00ms │                          97.02ms │     no change │
└──────────────┴────────────┴──────────────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                                  ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (main_base)                             │ 87749.93ms │
│ Total Time (alamb_enable_string_view_by_default)   │ 79626.03ms │
│ Average Time (main_base)                           │  2040.70ms │
│ Average Time (alamb_enable_string_view_by_default) │  1851.77ms │
│ Queries Faster                                     │         21 │
│ Queries Slower                                     │          0 │
│ Queries with No Change                             │         22 │
└────────────────────────────────────────────────────┴────────────┘
--------------------
Benchmark clickbench_partitioned.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃  main_base ┃ alamb_enable_string_view_by_def… ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 0     │     2.37ms │                           1.97ms │ +1.21x faster │
│ QQuery 1     │    38.68ms │                          36.78ms │     no change │
│ QQuery 2     │    95.07ms │                          94.11ms │     no change │
│ QQuery 3     │    98.75ms │                         100.86ms │     no change │
│ QQuery 4     │   927.07ms │                         931.67ms │     no change │
│ QQuery 5     │   964.71ms │                        1043.29ms │  1.08x slower │
│ QQuery 6     │    36.40ms │                          33.21ms │ +1.10x faster │
│ QQuery 7     │    42.34ms │                          40.83ms │     no change │
│ QQuery 8     │  1440.57ms │                        1434.71ms │     no change │
│ QQuery 9     │  1343.13ms │                        1345.13ms │     no change │
│ QQuery 10    │   357.26ms │                         432.73ms │  1.21x slower │
│ QQuery 11    │   404.40ms │                         478.21ms │  1.18x slower │
│ QQuery 12    │  1094.88ms │                        1057.96ms │     no change │
│ QQuery 13    │  1829.88ms │                        2013.06ms │  1.10x slower │
│ QQuery 14    │  1519.03ms │                        1516.29ms │     no change │
│ QQuery 15    │  1085.44ms │                        1068.30ms │     no change │
│ QQuery 16    │  2905.66ms │                        3086.20ms │  1.06x slower │
│ QQuery 17    │  2744.69ms │                        2810.55ms │     no change │
│ QQuery 18    │  5845.13ms │                        6043.80ms │     no change │
│ QQuery 19    │    95.15ms │                          89.50ms │ +1.06x faster │
│ QQuery 20    │  1751.84ms │                        2165.42ms │  1.24x slower │
│ QQuery 21    │  2019.88ms │                        2379.20ms │  1.18x slower │
│ QQuery 22    │  5167.42ms │                        5642.45ms │  1.09x slower │
│ QQuery 23    │ 10400.41ms │                       14124.07ms │  1.36x slower │
│ QQuery 24    │   584.30ms │                         656.67ms │  1.12x slower │
│ QQuery 25    │   498.02ms │                         519.50ms │     no change │
│ QQuery 26    │   651.84ms │                         719.41ms │  1.10x slower │
│ QQuery 27    │  2525.75ms │                        2694.77ms │  1.07x slower │
│ QQuery 28    │ 14905.83ms │                       26662.02ms │  1.79x slower │
│ QQuery 29    │   529.85ms │                         524.36ms │     no change │
│ QQuery 30    │  1086.70ms │                        1212.89ms │  1.12x slower │
│ QQuery 31    │  1163.36ms │                        1291.25ms │  1.11x slower │
│ QQuery 32    │  4764.64ms │                        4782.92ms │     no change │
│ QQuery 33    │  5149.48ms │                        4480.69ms │ +1.15x faster │
│ QQuery 34    │  5162.88ms │                        4481.70ms │ +1.15x faster │
│ QQuery 35    │  1811.72ms │                        1812.12ms │     no change │
│ QQuery 36    │   282.31ms │                         268.12ms │ +1.05x faster │
│ QQuery 37    │   121.31ms │                         104.96ms │ +1.16x faster │
│ QQuery 38    │   147.57ms │                         176.74ms │  1.20x slower │
│ QQuery 39    │   975.45ms │                         848.76ms │ +1.15x faster │
│ QQuery 40    │    61.35ms │                          57.33ms │ +1.07x faster │
│ QQuery 41    │    50.15ms │                          47.45ms │ +1.06x faster │
│ QQuery 42    │    63.67ms │                          62.35ms │     no change │
└──────────────┴────────────┴──────────────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                                  ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (main_base)                             │ 82746.33ms │
│ Total Time (alamb_enable_string_view_by_default)   │ 99374.32ms │
│ Average Time (main_base)                           │  1924.33ms │
│ Average Time (alamb_enable_string_view_by_default) │  2311.03ms │
│ Queries Faster                                     │         10 │
│ Queries Slower                                     │         16 │
│ Queries with No Change                             │         17 │
└────────────────────────────────────────────────────┴────────────┘

What changes are included in this PR?

  1. Set schema_force_view_types to true

Are these changes tested?

Yes, by CI tests

Are there any user-facing changes?

  1. Faster reading of data from Parquet files

If you see an error related to StringView use, you can disable this feature using the schema_force_string_view option

> set datafusion.execution.parquet.schema_force_view_types = false;
0 row(s) fetched.
Elapsed 0.000 seconds.

Context

@XiangpengHao debugged these tests previously using https://github.com/apache/datafusion/pull/11862

alamb avatar Aug 20 '24 19:08 alamb

Arrow-rs skips now the interval parts with 0? so interval tests are broken now

That is due to the arrow upgrade for sure -- you can see the changes needed here (in their own PR): https://github.com/apache/datafusion/pull/12032

alamb avatar Aug 22 '24 16:08 alamb

I think the last remaining failure is due to https://github.com/apache/datafusion/issues/12123

alamb avatar Aug 22 '24 21:08 alamb

Still working on benchmarks, but the code is looking good

alamb avatar Sep 12 '24 21:09 alamb

Annoyingly, when I ran the benchmarks now I get an error like this:

Q20: SELECT COUNT(*) FROM hits WHERE "URL" LIKE '%google%';
Error: Context("type_coercion", Plan("There isn't a common type to coerce BinaryView and Utf8 in LIKE expression"))

I will figure it out

Update -- this happens in hits_partitioned where the types are inferred like this (aka binary fields rather than utf8)

> describe hits_partitioned
;
+-----------------------+-----------+-------------+
| column_name           | data_type | is_nullable |
+-----------------------+-----------+-------------+
| WatchID               | Int64     | YES         |
| JavaEnable            | Int16     | YES         |
| Title                 | Binary    | YES         |
| GoodEvent             | Int16     | YES         |
| EventTime             | Int64     | YES         |
| EventDate             | UInt16    | YES         |
| CounterID             | Int32     | YES         |
| ClientIP              | Int32     | YES         |
| RegionID              | Int32     | YES         |
| UserID                | Int64     | YES         |
| CounterClass          | Int16     | YES         |
| OS                    | Int16     | YES         |
| UserAgent             | Int16     | YES         |
| URL                   | Binary    | YES         |
| Referer               | Binary    | YES         |
| IsRefresh             | Int16     | YES         |
| RefererCategoryID     | Int16     | YES         |
| RefererRegionID       | Int32     | YES         |
| URLCategoryID         | Int16     | YES         |
| URLRegionID           | Int32     | YES         |
| ResolutionWidth       | Int16     | YES         |
| ResolutionHeight      | Int16     | YES         |
| ResolutionDepth       | Int16     | YES         |
| FlashMajor            | Int16     | YES         |
| FlashMinor            | Int16     | YES         |
| FlashMinor2           | Binary    | YES         |
| NetMajor              | Int16     | YES         |
| NetMinor              | Int16     | YES         |
| UserAgentMajor        | Int16     | YES         |
| UserAgentMinor        | Binary    | YES         |
| CookieEnable          | Int16     | YES         |
| JavascriptEnable      | Int16     | YES         |
| IsMobile              | Int16     | YES         |
| MobilePhone           | Int16     | YES         |
| MobilePhoneModel      | Binary    | YES         |
| Params                | Binary    | YES         |
| IPNetworkID           | Int32     | YES         |
| TraficSourceID        | Int16     | YES         |
| SearchEngineID        | Int16     | YES         |
| SearchPhrase          | Binary    | YES         |
| AdvEngineID           | Int16     | YES         |
| IsArtifical           | Int16     | YES         |
| WindowClientWidth     | Int16     | YES         |
| WindowClientHeight    | Int16     | YES         |
| ClientTimeZone        | Int16     | YES         |
| ClientEventTime       | Int64     | YES         |
| SilverlightVersion1   | Int16     | YES         |
| SilverlightVersion2   | Int16     | YES         |
| SilverlightVersion3   | Int32     | YES         |
| SilverlightVersion4   | Int16     | YES         |
| PageCharset           | Binary    | YES         |
| CodeVersion           | Int32     | YES         |
| IsLink                | Int16     | YES         |
| IsDownload            | Int16     | YES         |
| IsNotBounce           | Int16     | YES         |
| FUniqID               | Int64     | YES         |
| OriginalURL           | Binary    | YES         |
| HID                   | Int32     | YES         |
| IsOldCounter          | Int16     | YES         |
| IsEvent               | Int16     | YES         |
| IsParameter           | Int16     | YES         |
| DontCountHits         | Int16     | YES         |
| WithHash              | Int16     | YES         |
| HitColor              | Binary    | YES         |
| LocalEventTime        | Int64     | YES         |
| Age                   | Int16     | YES         |
| Sex                   | Int16     | YES         |
| Income                | Int16     | YES         |
| Interests             | Int16     | YES         |
| Robotness             | Int16     | YES         |
| RemoteIP              | Int32     | YES         |
| WindowName            | Int32     | YES         |
| OpenerName            | Int32     | YES         |
| HistoryLength         | Int16     | YES         |
| BrowserLanguage       | Binary    | YES         |
| BrowserCountry        | Binary    | YES         |
| SocialNetwork         | Binary    | YES         |
| SocialAction          | Binary    | YES         |
| HTTPError             | Int16     | YES         |
| SendTiming            | Int32     | YES         |
| DNSTiming             | Int32     | YES         |
| ConnectTiming         | Int32     | YES         |
| ResponseStartTiming   | Int32     | YES         |
| ResponseEndTiming     | Int32     | YES         |
| FetchTiming           | Int32     | YES         |
| SocialSourceNetworkID | Int16     | YES         |
| SocialSourcePage      | Binary    | YES         |
| ParamPrice            | Int64     | YES         |
| ParamOrderID          | Binary    | YES         |
| ParamCurrency         | Binary    | YES         |
| ParamCurrencyID       | Int16     | YES         |
| OpenstatServiceName   | Binary    | YES         |
| OpenstatCampaignID    | Binary    | YES         |
| OpenstatAdID          | Binary    | YES         |
| OpenstatSourceID      | Binary    | YES         |
| UTMSource             | Binary    | YES         |
| UTMMedium             | Binary    | YES         |
| UTMCampaign           | Binary    | YES         |
| UTMContent            | Binary    | YES         |
| UTMTerm               | Binary    | YES         |
| FromTag               | Binary    | YES         |
| HasGCLID              | Int16     | YES         |
| RefererHash           | Int64     | YES         |
| URLHash               | Int64     | YES         |
| CLID                  | Int32     | YES         |
+-----------------------+-----------+-------------+
105 row(s) fetched.
Elapsed 0.037 seconds.

alamb avatar Sep 13 '24 17:09 alamb

I ran benchmarks and posted results, -- clickbench looks great 🚀 . However, interesting clickbench_partitioned got slower. I am pretty sure I know why (because the schema is resolved to BinaryView rather than Binary). Will review that later

alamb avatar Sep 13 '24 19:09 alamb

I figured out what is going on (different than I thought). I believe StringView::slice() is quite a bit slower than StringArray::slice due to the fact it has a buffers field

The query is

SELECT REGEXP_REPLACE("Referer", '^https?://(?:www\\.)?([^/]+)/.*$', '\\1') AS k, AVG(length("Referer")) AS l, COUNT(*) AS c, MIN("Referer")
FROM hits_partitioned
WHERE "Referer" <> '' GROUP BY k HAVING COUNT(*) > 100000 ORDER BY l DESC LIMIT 25;

Some flamegraphs: flamegraph-main flamegraph-string-view

Screenshot 2024-09-16 at 4 44 50 PM

I will think about the best way to proceed here

alamb avatar Sep 16 '24 20:09 alamb

I plan to take a closer look at this on Saturday if no one has beat me to it, cc @alamb

XiangpengHao avatar Sep 16 '24 21:09 XiangpengHao

I plan to take a closer look at this on Saturday if no one has beat me to it, cc @alamb

Thanks @XiangpengHao -- I

  • Filed https://github.com/apache/arrow-rs/issues/6408 to track the slicing thing upstream
  • Updated https://github.com/apache/datafusion/issues/6906 with rationale and a potential design to improve the performance

alamb avatar Sep 17 '24 12:09 alamb

Found and filed another slowdown

  • https://github.com/apache/datafusion/issues/12509

alamb avatar Sep 17 '24 15:09 alamb

Here is a proposal I think that could potentially fix the regression: https://github.com/apache/datafusion/issues/12509#issuecomment-2366775581 (basically push the casting down into ParquetExec). 🤔

alamb avatar Sep 22 '24 12:09 alamb

Benchmarks results

clickbench_1:

Is slower due to https://github.com/apache/datafusion/issues/12771

--------------------
Benchmark clickbench_1.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃  main_base ┃ alamb_enable_string_view_by_def… ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 0     │     0.66ms │                           0.69ms │     no change │
│ QQuery 1     │    71.33ms │                          75.13ms │  1.05x slower │
│ QQuery 2     │   131.03ms │                         127.11ms │     no change │
│ QQuery 3     │   126.47ms │                         130.56ms │     no change │
│ QQuery 4     │   980.91ms │                         979.41ms │     no change │
│ QQuery 5     │  1039.82ms │                        1074.08ms │     no change │
│ QQuery 6     │    63.36ms │                          65.10ms │     no change │
│ QQuery 7     │    81.92ms │                          81.78ms │     no change │
│ QQuery 8     │  1349.24ms │                        1307.04ms │     no change │
│ QQuery 9     │  1385.82ms │                        1389.54ms │     no change │
│ QQuery 10    │   451.48ms │                         343.72ms │ +1.31x faster │
│ QQuery 11    │   490.86ms │                         399.73ms │ +1.23x faster │
│ QQuery 12    │  1185.68ms │                        1105.26ms │ +1.07x faster │
│ QQuery 13    │  1807.96ms │                        2124.67ms │  1.18x slower │
│ QQuery 14    │  1316.22ms │                        1544.26ms │  1.17x slower │
│ QQuery 15    │  1134.52ms │                        1127.19ms │     no change │
│ QQuery 16    │  2585.31ms │                        3063.20ms │  1.18x slower │
│ QQuery 17    │  2377.87ms │                        2806.52ms │  1.18x slower │
│ QQuery 18    │  5039.50ms │                        5931.83ms │  1.18x slower │
│ QQuery 19    │   123.28ms │                         118.74ms │     no change │
│ QQuery 20    │  1620.67ms │                        1387.02ms │ +1.17x faster │
│ QQuery 21    │  2066.28ms │                        1716.94ms │ +1.20x faster │
│ QQuery 22    │  4534.60ms │                        4257.65ms │ +1.07x faster │
│ QQuery 23    │ 11757.43ms │                       10161.49ms │ +1.16x faster │
│ QQuery 24    │   751.56ms │                         677.31ms │ +1.11x faster │
│ QQuery 25    │   661.76ms │                         583.94ms │ +1.13x faster │
│ QQuery 26    │   827.91ms │                         778.19ms │ +1.06x faster │
│ QQuery 27    │  2542.23ms │                        2090.62ms │ +1.22x faster │
│ QQuery 28    │ 15396.30ms │                       14334.92ms │ +1.07x faster │
│ QQuery 29    │   570.84ms │                         572.35ms │     no change │
│ QQuery 30    │  1188.89ms │                        1142.40ms │     no change │
│ QQuery 31    │  1243.82ms │                        1173.04ms │ +1.06x faster │
│ QQuery 32    │  4336.24ms │                        4174.83ms │     no change │
│ QQuery 33    │  5266.84ms │                        4255.56ms │ +1.24x faster │
│ QQuery 34    │  5350.35ms │                        4244.46ms │ +1.26x faster │
│ QQuery 35    │  1781.95ms │                        1788.24ms │     no change │
│ QQuery 36    │   319.41ms │                         284.84ms │ +1.12x faster │
│ QQuery 37    │   207.99ms │                         183.65ms │ +1.13x faster │
│ QQuery 38    │   194.13ms │                         184.99ms │     no change │
│ QQuery 39    │   768.26ms │                         816.16ms │  1.06x slower │
│ QQuery 40    │    84.86ms │                          85.84ms │     no change │
│ QQuery 41    │    84.45ms │                          79.66ms │ +1.06x faster │
│ QQuery 42    │    96.44ms │                          97.44ms │     no change │
└──────────────┴────────────┴──────────────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                                  ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (main_base)                             │ 83396.44ms │
│ Total Time (alamb_enable_string_view_by_default)   │ 78867.11ms │
│ Average Time (main_base)                           │  1939.45ms │
│ Average Time (alamb_enable_string_view_by_default) │  1834.12ms │
│ Queries Faster                                     │         18 │
│ Queries Slower                                     │          7 │
│ Queries with No Change                             │         18 │
└────────────────────────────────────────────────────┴────────────┘
--------------------

alamb avatar Oct 05 '24 13:10 alamb

clickbench_extended

Looking good here

--------------------
Benchmark clickbench_extended.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃  main_base ┃ alamb_enable_string_view_by_def… ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 0     │  2741.98ms │                        2616.63ms │     no change │
│ QQuery 1     │   759.57ms │                         681.31ms │ +1.11x faster │
│ QQuery 2     │  1625.61ms │                        1398.16ms │ +1.16x faster │
│ QQuery 3     │   680.35ms │                         683.42ms │     no change │
│ QQuery 4     │ 12760.38ms │                       12498.84ms │     no change │
│ QQuery 5     │ 19106.95ms │                       19157.16ms │     no change │
└──────────────┴────────────┴──────────────────────────────────┴───────────────┘

alamb avatar Oct 05 '24 13:10 alamb

clickbench_partitioned

Slowing down also due to https://github.com/apache/datafusion/issues/12509 / https://github.com/apache/datafusion/issues/12788

--------------------
Benchmark clickbench_partitioned.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃  main_base ┃ alamb_enable_string_view_by_def… ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 0     │     2.20ms │                           2.23ms │     no change │
│ QQuery 1     │    40.39ms │                          40.67ms │     no change │
│ QQuery 2     │    96.55ms │                          98.55ms │     no change │
│ QQuery 3     │    99.47ms │                          99.17ms │     no change │
│ QQuery 4     │   924.46ms │                         922.44ms │     no change │
│ QQuery 5     │   950.04ms │                         968.54ms │     no change │
│ QQuery 6     │    34.09ms │                          33.63ms │     no change │
│ QQuery 7     │    43.25ms │                          43.24ms │     no change │
│ QQuery 8     │  1366.70ms │                        1350.65ms │     no change │
│ QQuery 9     │  1328.80ms │                        1344.05ms │     no change │
│ QQuery 10    │   348.25ms │                         314.41ms │ +1.11x faster │
│ QQuery 11    │   395.56ms │                         360.36ms │ +1.10x faster │
│ QQuery 12    │  1096.67ms │                         968.45ms │ +1.13x faster │
│ QQuery 13    │  1749.70ms │                        1747.58ms │     no change │
│ QQuery 14    │  1209.17ms │                        1406.79ms │  1.16x slower │
│ QQuery 15    │  1085.42ms │                        1077.92ms │     no change │
│ QQuery 16    │  2491.88ms │                        2904.94ms │  1.17x slower │
│ QQuery 17    │  2297.53ms │                        2705.51ms │  1.18x slower │
│ QQuery 18    │  5009.23ms │                        5787.71ms │  1.16x slower │
│ QQuery 19    │    94.74ms │                          90.64ms │     no change │
│ QQuery 20    │  1697.79ms │                        1821.09ms │  1.07x slower │
│ QQuery 21    │  1997.89ms │                        1907.50ms │     no change │
│ QQuery 22    │  4709.43ms │                        4818.95ms │     no change │
│ QQuery 23    │ 10513.97ms │                       10145.30ms │     no change │
│ QQuery 24    │   583.88ms │                         486.20ms │ +1.20x faster │
│ QQuery 25    │   488.55ms │                         408.51ms │ +1.20x faster │
│ QQuery 26    │   657.96ms │                         566.02ms │ +1.16x faster │
│ QQuery 27    │  2529.00ms │                        2311.10ms │ +1.09x faster │
│ QQuery 28    │ 14698.76ms │                       26994.08ms │  1.84x slower │
│ QQuery 29    │   521.52ms │                         544.05ms │     no change │
│ QQuery 30    │  1043.00ms │                        1014.79ms │     no change │
│ QQuery 31    │  1111.27ms │                        1057.91ms │     no change │
│ QQuery 32    │  4308.74ms │                        4254.27ms │     no change │
│ QQuery 33    │  5276.47ms │                        4134.58ms │ +1.28x faster │
│ QQuery 34    │  5342.06ms │                        4150.13ms │ +1.29x faster │
│ QQuery 35    │  1905.15ms │                        1857.24ms │     no change │
│ QQuery 36    │   276.11ms │                         238.65ms │ +1.16x faster │
│ QQuery 37    │   120.85ms │                          87.17ms │ +1.39x faster │
│ QQuery 38    │   144.61ms │                         152.75ms │  1.06x slower │
│ QQuery 39    │   739.13ms │                         759.31ms │     no change │
│ QQuery 40    │    56.56ms │                          55.03ms │     no change │
│ QQuery 41    │    48.43ms │                          50.81ms │     no change │
│ QQuery 42    │    63.36ms │                          63.27ms │     no change │
└──────────────┴────────────┴──────────────────────────────────┴───────────────┘

alamb avatar Oct 05 '24 13:10 alamb

Current status: https://github.com/apache/datafusion/issues/11682#issuecomment-2396551895

alamb avatar Oct 07 '24 11:10 alamb

My plan is to pull the changes from the following PRs into this PR and rerun the overall perf test

  • [x] https://github.com/apache/datafusion/pull/12792
  • [ ] https://github.com/apache/datafusion/pull/12816 @goldmedal
  • [x] https://github.com/apache/datafusion/pull/12809 from @Rachelint

Hopefully that will show the performance we need. Then we just need to get the various PRs actually merged and we can do this 😅

alamb avatar Oct 13 '24 12:10 alamb

Here are the current performance results: 🚀 Basically 10% faster across all queries and no slow downs

👏 @Rachelint @jayzhan211 @XiangpengHao and many many others.

Benchmark clickbench_1.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃  main_base ┃ alamb_enable_string_view_by_def… ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 0     │     0.70ms │                           0.67ms │     no change │
│ QQuery 1     │    75.35ms │                          69.82ms │ +1.08x faster │
│ QQuery 2     │   123.15ms │                         128.37ms │     no change │
│ QQuery 3     │   129.02ms │                         128.14ms │     no change │
│ QQuery 4     │   947.97ms │                         967.71ms │     no change │
│ QQuery 5     │  1086.19ms │                        1060.35ms │     no change │
│ QQuery 6     │    65.49ms │                          66.43ms │     no change │
│ QQuery 7     │    80.72ms │                          84.38ms │     no change │
│ QQuery 8     │  1330.08ms │                        1332.41ms │     no change │
│ QQuery 9     │  1345.21ms │                        1344.89ms │     no change │
│ QQuery 10    │   443.35ms │                         335.50ms │ +1.32x faster │
│ QQuery 11    │   493.80ms │                         372.10ms │ +1.33x faster │
│ QQuery 12    │  1205.69ms │                        1101.01ms │ +1.10x faster │
│ QQuery 13    │  1921.43ms │                        1617.54ms │ +1.19x faster │
│ QQuery 14    │  1355.42ms │                        1146.40ms │ +1.18x faster │
│ QQuery 15    │  1106.21ms │                        1129.07ms │     no change │
│ QQuery 16    │  2546.11ms │                        2434.61ms │     no change │
│ QQuery 17    │  2358.85ms │                        2269.62ms │     no change │
│ QQuery 18    │  4944.14ms │                        5101.45ms │     no change │
│ QQuery 19    │   121.48ms │                         124.59ms │     no change │
│ QQuery 20    │  1637.28ms │                        1377.14ms │ +1.19x faster │
│ QQuery 21    │  2106.73ms │                        1716.14ms │ +1.23x faster │
│ QQuery 22    │  5050.72ms │                        4231.71ms │ +1.19x faster │
│ QQuery 23    │ 12149.30ms │                        9998.89ms │ +1.22x faster │
│ QQuery 24    │   793.52ms │                         683.81ms │ +1.16x faster │
│ QQuery 25    │   700.98ms │                         587.52ms │ +1.19x faster │
│ QQuery 26    │   860.57ms │                         725.23ms │ +1.19x faster │
│ QQuery 27    │  2540.34ms │                        2104.51ms │ +1.21x faster │
│ QQuery 28    │ 14015.68ms │                       14587.90ms │     no change │
│ QQuery 29    │   571.20ms │                         554.32ms │     no change │
│ QQuery 30    │  1239.94ms │                        1118.25ms │ +1.11x faster │
│ QQuery 31    │  1276.58ms │                        1189.58ms │ +1.07x faster │
│ QQuery 32    │  4230.09ms │                        4204.57ms │     no change │
│ QQuery 33    │  5223.33ms │                        4283.46ms │ +1.22x faster │
│ QQuery 34    │  5328.12ms │                        4264.81ms │ +1.25x faster │
│ QQuery 35    │  1754.78ms │                        1797.85ms │     no change │
│ QQuery 36    │   306.94ms │                         287.75ms │ +1.07x faster │
│ QQuery 37    │   220.70ms │                         182.40ms │ +1.21x faster │
│ QQuery 38    │   192.66ms │                         188.30ms │     no change │
│ QQuery 39    │   790.63ms │                         551.89ms │ +1.43x faster │
│ QQuery 40    │    86.56ms │                          84.86ms │     no change │
│ QQuery 41    │    83.08ms │                          77.75ms │ +1.07x faster │
│ QQuery 42    │    93.76ms │                          94.03ms │     no change │
└──────────────┴────────────┴──────────────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                                  ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (main_base)                             │ 82933.83ms │
│ Total Time (alamb_enable_string_view_by_default)   │ 75707.73ms │
│ Average Time (main_base)                           │  1928.69ms │
│ Average Time (alamb_enable_string_view_by_default) │  1760.64ms │
│ Queries Faster                                     │         22 │
│ Queries Slower                                     │          0 │
│ Queries with No Change                             │         21 │
└────────────────────────────────────────────────────┴────────────┘

Also for partitioned:

--------------------
Benchmark clickbench_partitioned.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃  main_base ┃ alamb_enable_string_view_by_def… ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 0     │     2.24ms │                           2.22ms │     no change │
│ QQuery 1     │    37.96ms │                          38.15ms │     no change │
│ QQuery 2     │    95.79ms │                          94.43ms │     no change │
│ QQuery 3     │   102.41ms │                         100.80ms │     no change │
│ QQuery 4     │   927.15ms │                         930.56ms │     no change │
│ QQuery 5     │   973.96ms │                         951.87ms │     no change │
│ QQuery 6     │    34.33ms │                          33.58ms │     no change │
│ QQuery 7     │    42.82ms │                          41.29ms │     no change │
│ QQuery 8     │  1364.76ms │                        1344.57ms │     no change │
│ QQuery 9     │  1309.61ms │                        1365.89ms │     no change │
│ QQuery 10    │   346.30ms │                         307.70ms │ +1.13x faster │
│ QQuery 11    │   394.60ms │                         347.22ms │ +1.14x faster │
│ QQuery 12    │  1097.07ms │                        1007.94ms │ +1.09x faster │
│ QQuery 13    │  1636.51ms │                        1602.93ms │     no change │
│ QQuery 14    │  1197.28ms │                        1069.49ms │ +1.12x faster │
│ QQuery 15    │  1068.82ms │                        1077.73ms │     no change │
│ QQuery 16    │  2485.04ms │                        2398.93ms │     no change │
│ QQuery 17    │  2275.37ms │                        2212.18ms │     no change │
│ QQuery 18    │  4952.47ms │                        5114.44ms │     no change │
│ QQuery 19    │    96.01ms │                          92.62ms │     no change │
│ QQuery 20    │  1719.75ms │                        1254.03ms │ +1.37x faster │
│ QQuery 21    │  2001.12ms │                        1481.53ms │ +1.35x faster │
│ QQuery 22    │  5169.50ms │                        2649.07ms │ +1.95x faster │
│ QQuery 23    │ 10586.32ms │                        8619.20ms │ +1.23x faster │
│ QQuery 24    │   594.74ms │                         517.28ms │ +1.15x faster │
│ QQuery 25    │   495.22ms │                         427.98ms │ +1.16x faster │
│ QQuery 26    │   660.64ms │                         565.71ms │ +1.17x faster │
│ QQuery 27    │  2557.47ms │                        1846.03ms │ +1.39x faster │
│ QQuery 28    │ 13399.41ms │                       13583.18ms │     no change │
│ QQuery 29    │   535.81ms │                         524.10ms │     no change │
│ QQuery 30    │  1034.30ms │                         964.20ms │ +1.07x faster │
│ QQuery 31    │  1101.14ms │                        1035.81ms │ +1.06x faster │
│ QQuery 32    │  4202.55ms │                        4049.53ms │     no change │
│ QQuery 33    │  5136.86ms │                        3916.87ms │ +1.31x faster │
│ QQuery 34    │  5104.18ms │                        3932.13ms │ +1.30x faster │
│ QQuery 35    │  1909.94ms │                        1797.84ms │ +1.06x faster │
│ QQuery 36    │   264.07ms │                         225.04ms │ +1.17x faster │
│ QQuery 37    │   121.76ms │                          92.57ms │ +1.32x faster │
│ QQuery 38    │   147.92ms │                         131.16ms │ +1.13x faster │
│ QQuery 39    │   749.30ms │                         477.60ms │ +1.57x faster │
│ QQuery 40    │    56.82ms │                          55.17ms │     no change │
│ QQuery 41    │    47.54ms │                          45.12ms │ +1.05x faster │
│ QQuery 42    │    63.14ms │                          64.81ms │     no change │
└──────────────┴────────────┴──────────────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                                  ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (main_base)                             │ 78099.98ms │
│ Total Time (alamb_enable_string_view_by_default)   │ 68390.51ms │
│ Average Time (main_base)                           │  1816.28ms │
│ Average Time (alamb_enable_string_view_by_default) │  1590.48ms │
│ Queries Faster                                     │         22 │
│ Queries Slower                                     │          0 │
│ Queries with No Change                             │         21 │
└────────────────────────────────────────────────────┴────────────┘

alamb avatar Oct 14 '24 11:10 alamb

Next steps are to get the various PRs merged, get this one ready, and then write all about it.

alamb avatar Oct 14 '24 11:10 alamb

I made a new PR as this one has lots of now irrelevant historical context

New PR here: https://github.com/apache/datafusion/pull/13101

alamb avatar Oct 24 '24 20:10 alamb