[SPARK-48177][BUILD] Upgrade `Apache Parquet` to 1.14.1
What changes were proposed in this pull request?
Why are the changes needed?
Fixes quite a few bugs on the Parquet side: https://github.com/apache/parquet-mr/blob/master/CHANGES.md#version-1140
Does this PR introduce any user-facing change?
No
How was this patch tested?
Using the existing unit tests
Was this patch authored or co-authored using generative AI tooling?
No
cc @cloud-fan , @HyukjinKwon , @mridulm , @sunchao , @yaooqinn , @LuciferYang , @steveloughran , @viirya , @huaxin, @parthchandra , too.
Oh, it seems that wrong target folder files are added.
FYI, this PR is supposed to have two files: pom.xml and dev/deps/spark-deps-hadoop-3-hive-2.3.
Thanks for pointing out @dongjoon-hyun. I've fixed it right away 👍
I have to look into the tests 👀
I think the toPrettyJson errors seen here are reported in PARQUET-2468 and being addressed in https://github.com/apache/parquet-mr/pull/1349. We might have to wait for 1.14.1.
Cause: java.lang.RuntimeException: shaded.parquet.com.fasterxml.jackson.databind.exc.InvalidDefinitionException: No serializer found for class org.apache.parquet.schema.LogicalTypeAnnotation$StringLogicalTypeAnnotation and no properties discovered to create BeanSerializer (to avoid exception, disable SerializationFeature.FAIL_ON_EMPTY_BEANS) (through reference chain: org.apache.parquet.hadoop.metadata.ParquetMetadata["fileMetaData"]->org.apache.parquet.hadoop.metadata.FileMetaData["schema"]->org.apache.parquet.schema.MessageType["fields"]->java.util.ArrayList[1]->org.apache.parquet.schema.PrimitiveType["logicalTypeAnnotation"])
at org.apache.parquet.hadoop.metadata.ParquetMetadata.toJSON(ParquetMetadata.java:68)
at org.apache.parquet.hadoop.metadata.ParquetMetadata.toPrettyJSON(ParquetMetadata.java:48)
at org.apache.parquet.format.converter.ParquetMetadataConverter.readParquetMetadata(ParquetMetadataConverter.java:1592)
at org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:629)
Caused by: shaded.parquet.com.fasterxml.jackson.databind.exc.InvalidDefinitionException: Java 8 optional type `java.util.Optional<java.lang.Long>` not supported by default: add Module "shaded.parquet.com.fasterxml.jackson.datatype:jackson-datatype-jdk8" to enable handling (through reference chain: org.apache.parquet.hadoop.metadata.ParquetMetadata["blocks"]->java.util.ArrayList[0]->org.apache.parquet.hadoop.metadata.BlockMetaData["columns"]->java.util.Collections$UnmodifiableRandomAccessList[0]->org.apache.parquet.hadoop.metadata.IntColumnChunkMetaData["sizeStatistics"]->org.apache.parquet.column.statistics.SizeStatistics["unencodedByteArrayDataBytes"])
at shaded.parquet.com.fasterxml.jackson.databind.exc.InvalidDefinitionException.from(InvalidDefinitionException.java:77)
...
at shaded.parquet.com.fasterxml.jackson.databind.ObjectWriter.writeValue(ObjectWriter.java:1114)
at org.apache.parquet.hadoop.metadata.ParquetMetadata.toJSON(ParquetMetadata.java:62)
at org.apache.parquet.hadoop.metadata.ParquetMetadata.toPrettyJSON(ParquetMetadata.java:48)
at org.apache.parquet.format.converter.ParquetMetadataConverter.readParquetMetadata(ParquetMetadataConverter.java:1592)
at org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:629)
Thanks for digging into this @rshkv, let's follow up on the Parquet side
Thank you, @rshkv and @Fokko .
Apache Parquet 1.14.1 has been released, thanks @wgtmac 🙌
Thank you, @Fokko and @wgtmac .
Could you make CI happy, @Fokko ?
[info] - SPARK-30269 failed to update partition stats if it's equal to table's old stats *** FAILED *** (414 milliseconds)
[info] 690 did not equal 657 (StatisticsSuite.scala:1610)
[info] - Runtime bloom filter join: BF rewrite triggering threshold test *** FAILED *** (1 second, 469 milliseconds)
[info] 2 did not equal 0 (InjectRuntimeFilterSuite.scala:248)
[info] - primitive type - no column index *** FAILED *** (12 milliseconds)
[info] java.lang.ArrayIndexOutOfBoundsException: Index 1 out of bounds for length 1
[info] at org.apache.parquet.column.statistics.SizeStatistics$Builder.add(SizeStatistics.java:83)
[info] at org.apache.parquet.column.statistics.SizeStatistics$Builder.add(SizeStatistics.java:95)
[info] at org.apache.parquet.column.impl.ColumnValueCollector.write(ColumnValueCollector.java:92)
[info] at org.apache.parquet.column.impl.ColumnWriterBase.write(ColumnWriterBase.java:197)
[info] at org.apache.spark.sql.execution.datasources.parquet.ParquetVectorizedSuite.$anonfun$writeDataPage$1(ParquetVectorizedSuite.scala:607)
[info] at org.apache.spark.sql.execution.datasources.parquet.ParquetVectorizedSuite.$anonfun$writeDataPage$1$adapted(ParquetVectorizedSuite.scala:591)
[info] at scala.collection.IterableOnceOps.foreach(IterableOnce.scala:619)
[info] at scala.collection.IterableOnceOps.foreach$(IterableOnce.scala:617)
[info] at scala.collection.AbstractIterable.foreach(Iterable.scala:935)
[info] at org.apache.spark.sql.execution.datasources.parquet.ParquetVectorizedSuite.writeDataPage(ParquetVectorizedSuite.scala:591)
[info] at org.apache.spark.sql.execution.datasources.parquet.ParquetVectorizedSuite.$anonfun$testPrimitiveString$4(ParquetVectorizedSuite.scala:515)
[info] at scala.runtime.java8.JFunction1$mcVI$sp.apply(JFunction1$mcVI$sp.scala:18)
[info] at scala.collection.immutable.List.foreach(List.scala:334)
[info] at org.apache.spark.sql.execution.datasources.parquet.ParquetVectorizedSuite.testPrimitiveString(ParquetVectorizedSuite.scala:511)
[info] at org.apache.spark.sql.execution.datasources.parquet.ParquetVectorizedSuite.$anonfun$new$4(ParquetVectorizedSuite.scala:62)
[info] at org.apache.spark.sql.execution.datasources.parquet.ParquetVectorizedSuite.$anonfun$new$4$adapted(ParquetVectorizedSuite.scala:60)
[info] at scala.collection.immutable.List.foreach(List.scala:334)
[info] at org.apache.spark.sql.execution.datasources.parquet.ParquetVectorizedSuite.$anonfun$new$3(ParquetVectorizedSuite.scala:60)
[info] at org.apache.spark.sql.execution.datasources.parquet.ParquetVectorizedSuite.$anonfun$new$3$adapted(ParquetVectorizedSuite.scala:59)
[info] at scala.collection.immutable.List.foreach(List.scala:334)
[info] at org.apache.spark.sql.execution.datasources.parquet.ParquetVectorizedSuite.$anonfun$new$2(ParquetVectorizedSuite.scala:59)
[info] at scala.runtime.java8.JFunction1$mcVI$sp.apply(JFunction1$mcVI$sp.scala:18)
[info] at scala.collection.immutable.List.foreach(List.scala:334)
[info] at org.apache.spark.sql.execution.datasources.parquet.ParquetVectorizedSuite.$anonfun$new$1(ParquetVectorizedSuite.scala:58)
1.14.1 still seem to have this error in writing statistics. Does this indicate an incompatibility?
@LuciferYang Could you please check the test case? It seems to be writing def_level=1 to a column with max_def_level=0.
@LuciferYang Could you please check the test case? It seems to be writing
def_level=1to a column withmax_def_level=0.
This is an existing test case in Spark, this error does not occur when using version 1.13.x.
Yes I know that. The exception is thrown when building size statistics, which is a new feature and has caught similar issues in the test cases of parquet-mr. So I'd suggest to check if the existing test violates the rule of 0 <= def_level <= max_def_level.
These lines are suspicious: https://github.com/apache/spark/blob/05c87e51a5e50d1c156211848693b66937f12a8f/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetVectorizedSuite.scala#L501-L505
If inputValues do not have any null, maxDef is set to 0. However, definitionLevels for non-null value is set to 1, which exactly violates the rule I mentioned.
@wgtmac Thank you for your explanation, it seems you are correct, should Line 505 be changed from
val definitionLevels = inputValues.map(v => if (v == null) 0 else 1)
to
val definitionLevels = inputValues.map(v => if (v == null) 0 else maxDef)
? I manually tested it, and this way ParquetVectorizedSuite can pass.
Yes, that change looks reasonable. Thanks for verification! @LuciferYang
(I have to admit that it is a little bit aggressive to enable a new feature by default on the parquet side, sigh)
@LuciferYang Thanks for the pointer, I've updated the PR 👍
Could you make CI happy, @Fokko ?
[info] - SPARK-30269 failed to update partition stats if it's equal to table's old stats *** FAILED *** (414 milliseconds) [info] 690 did not equal 657 (StatisticsSuite.scala:1610)[info] - Runtime bloom filter join: BF rewrite triggering threshold test *** FAILED *** (1 second, 469 milliseconds) [info] 2 did not equal 0 (InjectRuntimeFilterSuite.scala:248)
@Fokko It seems that the data written by 1.14.1 is larger than that by 1.13.1.
https://github.com/apache/spark/blob/b77caf776368154096442965b4a885c4a702d27f/sql/hive/src/test/scala/org/apache/spark/sql/hive/StatisticsSuite.scala#L1606
The expectedSize needs to be changed to 690.
https://github.com/apache/spark/blob/b77caf776368154096442965b4a885c4a702d27f/sql/core/src/test/scala/org/apache/spark/sql/InjectRuntimeFilterSuite.scala#L487-L503
The log on line 489 needs to be fixed, the statement "bf5filtered has 14168 bytes and bf2 has 3409 bytes" is likely no longer accurate now. And the threshold on line 498 can be changed to 16000, the exact value is 15049.
@LuciferYang Thanks again for the elaborate pointers. I just switched jobs and got a new laptop, so I have to reconfigure everything :) I'll keep an eye on the CI
Looks like very promising! Thanks all for the work! The failed tests do not seem related - I just re-triggered the CI jobs to be sure.
One thing nice to have is to do a bit perf comparison using benchmarks like DataSourceReadBenchmark and DataSourceWriteBenchmark, just to make sure there is no regression.
has anyone set up a nightly jenkins with stable spark and its tests set to run off a nightly build of parquet? would seem a good way to catch regressions early -provided the test failures get attention. That's always a problem with cross project builds
@Fokko Do you have time to move this pr forward?
@LuciferYang Yes, let me get right to it!
Sorry for the long wait, that's quite a comprehensive test suite. I've ran the benchmarks both on the main branch and this branch:
This branch
DataSourceReadBenchmark
[info] running (fork) org.apache.spark.sql.execution.benchmark.DataSourceReadBenchmark
[error] WARNING: Using incubator modules: jdk.incubator.vector
[info] 10:54:40.855 WARN org.apache.hadoop.util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
[info] Running benchmark: SQL Single BOOLEAN Column Scan
[info] Running case: SQL CSV
[info] Stopped after 2 iterations, 11370 ms
[info] Running case: SQL Json
[info] Stopped after 2 iterations, 7099 ms
[info] Running case: SQL Parquet Vectorized: DataPageV1
[info] Stopped after 42 iterations, 2032 ms
[info] Running case: SQL Parquet Vectorized: DataPageV2
[info] Stopped after 39 iterations, 2033 ms
[info] Running case: SQL Parquet MR: DataPageV1
[info] Stopped after 2 iterations, 2695 ms
[info] Running case: SQL Parquet MR: DataPageV2
[info] Stopped after 2 iterations, 2472 ms
[info] Running case: SQL ORC Vectorized
[info] Stopped after 37 iterations, 2048 ms
[info] Running case: SQL ORC MR
[info] Stopped after 2 iterations, 2723 ms
[info] OpenJDK 64-Bit Server VM 21.0.3 on Mac OS X 14.5
[info] Apple M3 Max
[info] SQL Single BOOLEAN Column Scan: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] SQL CSV 5677 5685 12 2.8 360.9 1.0X
[info] SQL Json 3517 3550 46 4.5 223.6 1.6X
[info] SQL Parquet Vectorized: DataPageV1 42 48 5 372.1 2.7 134.3X
[info] SQL Parquet Vectorized: DataPageV2 45 52 6 350.1 2.9 126.4X
[info] SQL Parquet MR: DataPageV1 1347 1348 0 11.7 85.7 4.2X
[info] SQL Parquet MR: DataPageV2 1220 1236 23 12.9 77.6 4.7X
[info] SQL ORC Vectorized 52 55 3 300.6 3.3 108.5X
[info] SQL ORC MR 1306 1362 78 12.0 83.0 4.3X
[info] Running benchmark: Parquet Reader Single BOOLEAN Column Scan
[info] Running case: ParquetReader Vectorized: DataPageV1
[info] Stopped after 45 iterations, 2010 ms
[info] Running case: ParquetReader Vectorized: DataPageV2
[info] Stopped after 34 iterations, 2063 ms
[info] Running case: ParquetReader Vectorized -> Row: DataPageV1
[info] Stopped after 70 iterations, 2000 ms
[info] Running case: ParquetReader Vectorized -> Row: DataPageV2
[info] Stopped after 60 iterations, 2007 ms
[info] OpenJDK 64-Bit Server VM 21.0.3 on Mac OS X 14.5
[info] Apple M3 Max
[info] Parquet Reader Single BOOLEAN Column Scan: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] ---------------------------------------------------------------------------------------------------------------------------
[info] ParquetReader Vectorized: DataPageV1 41 45 3 382.7 2.6 1.0X
[info] ParquetReader Vectorized: DataPageV2 51 61 9 308.2 3.2 0.8X
[info] ParquetReader Vectorized -> Row: DataPageV1 22 29 4 717.9 1.4 1.9X
[info] ParquetReader Vectorized -> Row: DataPageV2 30 33 3 533.1 1.9 1.4X
[info] Running benchmark: SQL Single TINYINT Column Scan
[info] Running case: SQL CSV
[info] Stopped after 2 iterations, 12625 ms
[info] Running case: SQL Json
[info] Stopped after 2 iterations, 8105 ms
[info] Running case: SQL Parquet Vectorized: DataPageV1
[info] Stopped after 32 iterations, 2040 ms
[info] Running case: SQL Parquet Vectorized: DataPageV2
[info] Stopped after 33 iterations, 2036 ms
[info] Running case: SQL Parquet MR: DataPageV1
[info] Stopped after 2 iterations, 3119 ms
[info] Running case: SQL Parquet MR: DataPageV2
[info] Stopped after 2 iterations, 2517 ms
[info] Running case: SQL ORC Vectorized
[info] Stopped after 31 iterations, 2060 ms
[info] Running case: SQL ORC MR
[info] Stopped after 2 iterations, 3078 ms
[info] OpenJDK 64-Bit Server VM 21.0.3 on Mac OS X 14.5
[info] Apple M3 Max
[info] SQL Single TINYINT Column Scan: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] SQL CSV 6208 6313 148 2.5 394.7 1.0X
[info] SQL Json 4022 4053 43 3.9 255.7 1.5X
[info] SQL Parquet Vectorized: DataPageV1 56 64 8 282.6 3.5 111.5X
[info] SQL Parquet Vectorized: DataPageV2 58 62 6 273.2 3.7 107.8X
[info] SQL Parquet MR: DataPageV1 1538 1560 31 10.2 97.8 4.0X
[info] SQL Parquet MR: DataPageV2 1255 1259 6 12.5 79.8 4.9X
[info] SQL ORC Vectorized 58 66 7 272.8 3.7 107.7X
[info] SQL ORC MR 1519 1539 29 10.4 96.6 4.1X
[info] Running benchmark: Parquet Reader Single TINYINT Column Scan
[info] Running case: ParquetReader Vectorized: DataPageV1
[info] Stopped after 30 iterations, 2052 ms
[info] Running case: ParquetReader Vectorized: DataPageV2
[info] Stopped after 30 iterations, 2009 ms
[info] Running case: ParquetReader Vectorized -> Row: DataPageV1
[info] Stopped after 66 iterations, 2017 ms
[info] Running case: ParquetReader Vectorized -> Row: DataPageV2
[info] Stopped after 70 iterations, 2023 ms
[info] OpenJDK 64-Bit Server VM 21.0.3 on Mac OS X 14.5
[info] Apple M3 Max
[info] Parquet Reader Single TINYINT Column Scan: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] ---------------------------------------------------------------------------------------------------------------------------
[info] ParquetReader Vectorized: DataPageV1 66 68 1 239.4 4.2 1.0X
[info] ParquetReader Vectorized: DataPageV2 65 67 2 242.9 4.1 1.0X
[info] ParquetReader Vectorized -> Row: DataPageV1 25 31 1 624.0 1.6 2.6X
[info] ParquetReader Vectorized -> Row: DataPageV2 25 29 2 628.7 1.6 2.6X
[info] Running benchmark: SQL Single SMALLINT Column Scan
[info] Running case: SQL CSV
[info] Stopped after 2 iterations, 12916 ms
[info] Running case: SQL Json
[info] Stopped after 2 iterations, 8438 ms
[info] Running case: SQL Parquet Vectorized: DataPageV1
[info] Stopped after 27 iterations, 2061 ms
[info] Running case: SQL Parquet Vectorized: DataPageV2
[info] Stopped after 26 iterations, 2070 ms
[info] Running case: SQL Parquet MR: DataPageV1
[info] Stopped after 2 iterations, 3174 ms
[info] Running case: SQL Parquet MR: DataPageV2
[info] Stopped after 2 iterations, 2953 ms
[info] Running case: SQL ORC Vectorized
[info] Stopped after 23 iterations, 2056 ms
[info] Running case: SQL ORC MR
[info] Stopped after 2 iterations, 2881 ms
[info] OpenJDK 64-Bit Server VM 21.0.3 on Mac OS X 14.5
[info] Apple M3 Max
[info] SQL Single SMALLINT Column Scan: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] SQL CSV 6439 6458 28 2.4 409.4 1.0X
[info] SQL Json 4186 4219 48 3.8 266.1 1.5X
[info] SQL Parquet Vectorized: DataPageV1 66 76 13 239.7 4.2 98.1X
[info] SQL Parquet Vectorized: DataPageV2 72 80 7 218.8 4.6 89.6X
[info] SQL Parquet MR: DataPageV1 1550 1587 52 10.1 98.6 4.2X
[info] SQL Parquet MR: DataPageV2 1453 1477 34 10.8 92.4 4.4X
[info] SQL ORC Vectorized 84 89 5 186.7 5.4 76.4X
[info] SQL ORC MR 1439 1441 3 10.9 91.5 4.5X
[info] Running benchmark: Parquet Reader Single SMALLINT Column Scan
[info] Running case: ParquetReader Vectorized: DataPageV1
[info] Stopped after 24 iterations, 2016 ms
[info] Running case: ParquetReader Vectorized: DataPageV2
[info] Stopped after 21 iterations, 2001 ms
[info] Running case: ParquetReader Vectorized -> Row: DataPageV1
[info] Stopped after 25 iterations, 2049 ms
[info] Running case: ParquetReader Vectorized -> Row: DataPageV2
[info] Stopped after 22 iterations, 2052 ms
[info] OpenJDK 64-Bit Server VM 21.0.3 on Mac OS X 14.5
[info] Apple M3 Max
[info] Parquet Reader Single SMALLINT Column Scan: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] ---------------------------------------------------------------------------------------------------------------------------
[info] ParquetReader Vectorized: DataPageV1 80 84 4 195.6 5.1 1.0X
[info] ParquetReader Vectorized: DataPageV2 92 95 3 170.4 5.9 0.9X
[info] ParquetReader Vectorized -> Row: DataPageV1 81 82 1 194.6 5.1 1.0X
[info] ParquetReader Vectorized -> Row: DataPageV2 92 93 1 171.0 5.8 0.9X
[info] Running benchmark: SQL Single INT Column Scan
[info] Running case: SQL CSV
[info] Stopped after 2 iterations, 13074 ms
[info] Running case: SQL Json
[info] Stopped after 2 iterations, 8973 ms
[info] Running case: SQL Parquet Vectorized: DataPageV1
[info] Stopped after 29 iterations, 2003 ms
[info] Running case: SQL Parquet Vectorized: DataPageV2
[info] Stopped after 18 iterations, 2017 ms
[info] Running case: SQL Parquet MR: DataPageV1
[info] Stopped after 2 iterations, 3067 ms
[info] Running case: SQL Parquet MR: DataPageV2
[info] Stopped after 2 iterations, 2820 ms
[info] Running case: SQL ORC Vectorized
[info] Stopped after 18 iterations, 2054 ms
[info] Running case: SQL ORC MR
[info] Stopped after 2 iterations, 2859 ms
[info] OpenJDK 64-Bit Server VM 21.0.3 on Mac OS X 14.5
[info] Apple M3 Max
[info] SQL Single INT Column Scan: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] SQL CSV 6436 6537 143 2.4 409.2 1.0X
[info] SQL Json 4486 4487 2 3.5 285.2 1.4X
[info] SQL Parquet Vectorized: DataPageV1 59 69 18 268.3 3.7 109.8X
[info] SQL Parquet Vectorized: DataPageV2 106 112 5 148.4 6.7 60.7X
[info] SQL Parquet MR: DataPageV1 1528 1534 9 10.3 97.1 4.2X
[info] SQL Parquet MR: DataPageV2 1402 1410 11 11.2 89.1 4.6X
[info] SQL ORC Vectorized 110 114 4 143.5 7.0 58.7X
[info] SQL ORC MR 1411 1430 26 11.1 89.7 4.6X
[info] Running benchmark: Parquet Reader Single INT Column Scan
[info] Running case: ParquetReader Vectorized: DataPageV1
[info] Stopped after 22 iterations, 2054 ms
[info] Running case: ParquetReader Vectorized: DataPageV2
[info] Stopped after 14 iterations, 2048 ms
[info] Running case: ParquetReader Vectorized -> Row: DataPageV1
[info] Stopped after 26 iterations, 2019 ms
[info] Running case: ParquetReader Vectorized -> Row: DataPageV2
[info] Stopped after 17 iterations, 2115 ms
[info] OpenJDK 64-Bit Server VM 21.0.3 on Mac OS X 14.5
[info] Apple M3 Max
[info] Parquet Reader Single INT Column Scan: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] ---------------------------------------------------------------------------------------------------------------------------
[info] ParquetReader Vectorized: DataPageV1 92 93 1 171.6 5.8 1.0X
[info] ParquetReader Vectorized: DataPageV2 142 146 6 111.0 9.0 0.6X
[info] ParquetReader Vectorized -> Row: DataPageV1 76 78 2 206.0 4.9 1.2X
[info] ParquetReader Vectorized -> Row: DataPageV2 123 124 1 128.0 7.8 0.7X
[info] Running benchmark: SQL Single BIGINT Column Scan
[info] Running case: SQL CSV
[info] Stopped after 2 iterations, 13054 ms
[info] Running case: SQL Json
[info] Stopped after 2 iterations, 8794 ms
[info] Running case: SQL Parquet Vectorized: DataPageV1
[info] Stopped after 13 iterations, 2079 ms
[info] Running case: SQL Parquet Vectorized: DataPageV2
[info] Stopped after 16 iterations, 2027 ms
[info] Running case: SQL Parquet MR: DataPageV1
[info] Stopped after 2 iterations, 3519 ms
[info] Running case: SQL Parquet MR: DataPageV2
[info] Stopped after 2 iterations, 3042 ms
[info] Running case: SQL ORC Vectorized
[info] Stopped after 18 iterations, 2076 ms
[info] Running case: SQL ORC MR
[info] Stopped after 2 iterations, 3202 ms
[info] OpenJDK 64-Bit Server VM 21.0.3 on Mac OS X 14.5
[info] Apple M3 Max
[info] SQL Single BIGINT Column Scan: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] SQL CSV 6451 6527 109 2.4 410.1 1.0X
[info] SQL Json 4394 4397 4 3.6 279.4 1.5X
[info] SQL Parquet Vectorized: DataPageV1 142 160 15 110.7 9.0 45.4X
[info] SQL Parquet Vectorized: DataPageV2 119 127 6 132.2 7.6 54.2X
[info] SQL Parquet MR: DataPageV1 1746 1760 19 9.0 111.0 3.7X
[info] SQL Parquet MR: DataPageV2 1499 1521 32 10.5 95.3 4.3X
[info] SQL ORC Vectorized 108 115 8 145.2 6.9 59.5X
[info] SQL ORC MR 1580 1601 29 10.0 100.5 4.1X
[info] Running benchmark: Parquet Reader Single BIGINT Column Scan
[info] Running case: ParquetReader Vectorized: DataPageV1
[info] Stopped after 12 iterations, 2136 ms
[info] Running case: ParquetReader Vectorized: DataPageV2
[info] Stopped after 13 iterations, 2071 ms
[info] Running case: ParquetReader Vectorized -> Row: DataPageV1
[info] Stopped after 14 iterations, 2151 ms
[info] Running case: ParquetReader Vectorized -> Row: DataPageV2
[info] Stopped after 15 iterations, 2075 ms
[info] OpenJDK 64-Bit Server VM 21.0.3 on Mac OS X 14.5
[info] Apple M3 Max
[info] Parquet Reader Single BIGINT Column Scan: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] ---------------------------------------------------------------------------------------------------------------------------
[info] ParquetReader Vectorized: DataPageV1 172 178 10 91.6 10.9 1.0X
[info] ParquetReader Vectorized: DataPageV2 158 159 1 99.6 10.0 1.1X
[info] ParquetReader Vectorized -> Row: DataPageV1 152 154 2 103.4 9.7 1.1X
[info] ParquetReader Vectorized -> Row: DataPageV2 137 138 1 114.9 8.7 1.3X
[info] Running benchmark: SQL Single FLOAT Column Scan
[info] Running case: SQL CSV
[info] Stopped after 2 iterations, 13474 ms
[info] Running case: SQL Json
[info] Stopped after 2 iterations, 9857 ms
[info] Running case: SQL Parquet Vectorized: DataPageV1
[info] Stopped after 26 iterations, 2006 ms
[info] Running case: SQL Parquet Vectorized: DataPageV2
[info] Stopped after 30 iterations, 2048 ms
[info] Running case: SQL Parquet MR: DataPageV1
[info] Stopped after 2 iterations, 3290 ms
[info] Running case: SQL Parquet MR: DataPageV2
[info] Stopped after 2 iterations, 3089 ms
[info] Running case: SQL ORC Vectorized
[info] Stopped after 17 iterations, 2025 ms
[info] Running case: SQL ORC MR
[info] Stopped after 2 iterations, 3319 ms
[info] OpenJDK 64-Bit Server VM 21.0.3 on Mac OS X 14.5
[info] Apple M3 Max
[info] SQL Single FLOAT Column Scan: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] SQL CSV 6650 6737 124 2.4 422.8 1.0X
[info] SQL Json 4923 4929 8 3.2 313.0 1.4X
[info] SQL Parquet Vectorized: DataPageV1 60 77 22 264.0 3.8 111.6X
[info] SQL Parquet Vectorized: DataPageV2 58 68 9 270.8 3.7 114.5X
[info] SQL Parquet MR: DataPageV1 1633 1645 18 9.6 103.8 4.1X
[info] SQL Parquet MR: DataPageV2 1543 1545 3 10.2 98.1 4.3X
[info] SQL ORC Vectorized 113 119 5 139.1 7.2 58.8X
[info] SQL ORC MR 1659 1660 1 9.5 105.5 4.0X
[info] Running benchmark: Parquet Reader Single FLOAT Column Scan
[info] Running case: ParquetReader Vectorized: DataPageV1
[info] Stopped after 22 iterations, 2040 ms
[info] Running case: ParquetReader Vectorized: DataPageV2
[info] Stopped after 22 iterations, 2010 ms
[info] Running case: ParquetReader Vectorized -> Row: DataPageV1
[info] Stopped after 27 iterations, 2048 ms
[info] Running case: ParquetReader Vectorized -> Row: DataPageV2
[info] Stopped after 27 iterations, 2048 ms
[info] OpenJDK 64-Bit Server VM 21.0.3 on Mac OS X 14.5
[info] Apple M3 Max
[info] Parquet Reader Single FLOAT Column Scan: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] ---------------------------------------------------------------------------------------------------------------------------
[info] ParquetReader Vectorized: DataPageV1 90 93 1 175.1 5.7 1.0X
[info] ParquetReader Vectorized: DataPageV2 88 91 2 178.1 5.6 1.0X
[info] ParquetReader Vectorized -> Row: DataPageV1 72 76 3 217.2 4.6 1.2X
[info] ParquetReader Vectorized -> Row: DataPageV2 72 76 3 217.6 4.6 1.2X
[info] Running benchmark: SQL Single DOUBLE Column Scan
[info] Running case: SQL CSV
[info] Stopped after 2 iterations, 14018 ms
[info] Running case: SQL Json
[info] Stopped after 2 iterations, 10027 ms
[info] Running case: SQL Parquet Vectorized: DataPageV1
[info] Stopped after 14 iterations, 2079 ms
[info] Running case: SQL Parquet Vectorized: DataPageV2
[info] Stopped after 14 iterations, 2021 ms
[info] Running case: SQL Parquet MR: DataPageV1
[info] Stopped after 2 iterations, 3516 ms
[info] Running case: SQL Parquet MR: DataPageV2
[info] Stopped after 2 iterations, 3553 ms
[info] Running case: SQL ORC Vectorized
[info] Stopped after 7 iterations, 2210 ms
[info] Running case: SQL ORC MR
[info] Stopped after 2 iterations, 3581 ms
[info] OpenJDK 64-Bit Server VM 21.0.3 on Mac OS X 14.5
[info] Apple M3 Max
[info] SQL Single DOUBLE Column Scan: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] SQL CSV 6969 7009 57 2.3 443.1 1.0X
[info] SQL Json 5001 5014 18 3.1 318.0 1.4X
[info] SQL Parquet Vectorized: DataPageV1 143 149 6 109.7 9.1 48.6X
[info] SQL Parquet Vectorized: DataPageV2 140 144 2 112.0 8.9 49.6X
[info] SQL Parquet MR: DataPageV1 1709 1758 69 9.2 108.7 4.1X
[info] SQL Parquet MR: DataPageV2 1710 1777 95 9.2 108.7 4.1X
[info] SQL ORC Vectorized 311 316 4 50.5 19.8 22.4X
[info] SQL ORC MR 1779 1791 16 8.8 113.1 3.9X
[info] Running benchmark: Parquet Reader Single DOUBLE Column Scan
[info] Running case: ParquetReader Vectorized: DataPageV1
[info] Stopped after 12 iterations, 2041 ms
[info] Running case: ParquetReader Vectorized: DataPageV2
[info] Stopped after 12 iterations, 2049 ms
[info] Running case: ParquetReader Vectorized -> Row: DataPageV1
[info] Stopped after 13 iterations, 2010 ms
[info] Running case: ParquetReader Vectorized -> Row: DataPageV2
[info] Stopped after 13 iterations, 2004 ms
[info] OpenJDK 64-Bit Server VM 21.0.3 on Mac OS X 14.5
[info] Apple M3 Max
[info] Parquet Reader Single DOUBLE Column Scan: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] ---------------------------------------------------------------------------------------------------------------------------
[info] ParquetReader Vectorized: DataPageV1 168 170 2 93.6 10.7 1.0X
[info] ParquetReader Vectorized: DataPageV2 168 171 2 93.6 10.7 1.0X
[info] ParquetReader Vectorized -> Row: DataPageV1 153 155 1 102.7 9.7 1.1X
[info] ParquetReader Vectorized -> Row: DataPageV2 152 154 1 103.2 9.7 1.1X
[info] Running benchmark: SQL Single TINYINT Column Scan in Struct
[info] Running case: SQL ORC MR
[info] Stopped after 2 iterations, 3814 ms
[info] Running case: SQL ORC Vectorized (Nested Column Disabled)
[info] Stopped after 2 iterations, 3902 ms
[info] Running case: SQL ORC Vectorized (Nested Column Enabled)
[info] Stopped after 11 iterations, 2028 ms
[info] Running case: SQL Parquet MR: DataPageV1
[info] Stopped after 2 iterations, 3372 ms
[info] Running case: SQL Parquet Vectorized: DataPageV1 (Nested Column Disabled)
[info] Stopped after 2 iterations, 3688 ms
[info] Running case: SQL Parquet Vectorized: DataPageV1 (Nested Column Enabled)
[info] Stopped after 29 iterations, 2007 ms
[info] Running case: SQL Parquet MR: DataPageV2
[info] Stopped after 2 iterations, 3113 ms
[info] Running case: SQL Parquet Vectorized: DataPageV2 (Nested Column Disabled)
[info] Stopped after 2 iterations, 3562 ms
[info] Running case: SQL Parquet Vectorized: DataPageV2 (Nested Column Enabled)
[info] Stopped after 31 iterations, 2021 ms
[info] OpenJDK 64-Bit Server VM 21.0.3 on Mac OS X 14.5
[info] Apple M3 Max
[info] SQL Single TINYINT Column Scan in Struct: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] -------------------------------------------------------------------------------------------------------------------------------------------
[info] SQL ORC MR 1793 1907 161 8.8 114.0 1.0X
[info] SQL ORC Vectorized (Nested Column Disabled) 1781 1951 241 8.8 113.2 1.0X
[info] SQL ORC Vectorized (Nested Column Enabled) 166 184 20 94.7 10.6 10.8X
[info] SQL Parquet MR: DataPageV1 1658 1686 40 9.5 105.4 1.1X
[info] SQL Parquet Vectorized: DataPageV1 (Nested Column Disabled) 1838 1844 9 8.6 116.8 1.0X
[info] SQL Parquet Vectorized: DataPageV1 (Nested Column Enabled) 58 69 12 269.6 3.7 30.7X
[info] SQL Parquet MR: DataPageV2 1533 1557 34 10.3 97.5 1.2X
[info] SQL Parquet Vectorized: DataPageV2 (Nested Column Disabled) 1775 1781 9 8.9 112.8 1.0X
[info] SQL Parquet Vectorized: DataPageV2 (Nested Column Enabled) 58 65 7 272.1 3.7 31.0X
[info] Running benchmark: SQL Single SMALLINT Column Scan in Struct
[info] Running case: SQL ORC MR
[info] Stopped after 2 iterations, 3021 ms
[info] Running case: SQL ORC Vectorized (Nested Column Disabled)
[info] Stopped after 2 iterations, 2975 ms
[info] Running case: SQL ORC Vectorized (Nested Column Enabled)
[info] Stopped after 11 iterations, 2063 ms
[info] Running case: SQL Parquet MR: DataPageV1
[info] Stopped after 2 iterations, 3388 ms
[info] Running case: SQL Parquet Vectorized: DataPageV1 (Nested Column Disabled)
[info] Stopped after 2 iterations, 3653 ms
[info] Running case: SQL Parquet Vectorized: DataPageV1 (Nested Column Enabled)
[info] Stopped after 29 iterations, 2069 ms
[info] Running case: SQL Parquet MR: DataPageV2
[info] Stopped after 2 iterations, 3165 ms
[info] Running case: SQL Parquet Vectorized: DataPageV2 (Nested Column Disabled)
[info] Stopped after 2 iterations, 3454 ms
[info] Running case: SQL Parquet Vectorized: DataPageV2 (Nested Column Enabled)
[info] Stopped after 18 iterations, 2053 ms
[info] OpenJDK 64-Bit Server VM 21.0.3 on Mac OS X 14.5
[info] Apple M3 Max
[info] SQL Single SMALLINT Column Scan in Struct: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] -------------------------------------------------------------------------------------------------------------------------------------------
[info] SQL ORC MR 1468 1511 60 10.7 93.4 1.0X
[info] SQL ORC Vectorized (Nested Column Disabled) 1484 1488 5 10.6 94.4 1.0X
[info] SQL ORC Vectorized (Nested Column Enabled) 184 188 5 85.7 11.7 8.0X
[info] SQL Parquet MR: DataPageV1 1690 1694 6 9.3 107.4 0.9X
[info] SQL Parquet Vectorized: DataPageV1 (Nested Column Disabled) 1819 1827 11 8.6 115.6 0.8X
[info] SQL Parquet Vectorized: DataPageV1 (Nested Column Enabled) 61 71 8 255.9 3.9 23.9X
[info] SQL Parquet MR: DataPageV2 1581 1583 3 9.9 100.5 0.9X
[info] SQL Parquet Vectorized: DataPageV2 (Nested Column Disabled) 1724 1727 4 9.1 109.6 0.9X
[info] SQL Parquet Vectorized: DataPageV2 (Nested Column Enabled) 107 114 8 147.4 6.8 13.8X
[info] Running benchmark: SQL Single INT Column Scan in Struct
[info] Running case: SQL ORC MR
[info] Stopped after 2 iterations, 3417 ms
[info] Running case: SQL ORC Vectorized (Nested Column Disabled)
[info] Stopped after 2 iterations, 3437 ms
[info] Running case: SQL ORC Vectorized (Nested Column Enabled)
[info] Stopped after 10 iterations, 2103 ms
[info] Running case: SQL Parquet MR: DataPageV1
[info] Stopped after 2 iterations, 3545 ms
[info] Running case: SQL Parquet Vectorized: DataPageV1 (Nested Column Disabled)
[info] Stopped after 2 iterations, 3797 ms
[info] Running case: SQL Parquet Vectorized: DataPageV1 (Nested Column Enabled)
[info] Stopped after 28 iterations, 2035 ms
[info] Running case: SQL Parquet MR: DataPageV2
[info] Stopped after 2 iterations, 3402 ms
[info] Running case: SQL Parquet Vectorized: DataPageV2 (Nested Column Disabled)
[info] Stopped after 2 iterations, 3709 ms
[info] Running case: SQL Parquet Vectorized: DataPageV2 (Nested Column Enabled)
[info] Stopped after 22 iterations, 2040 ms
[info] OpenJDK 64-Bit Server VM 21.0.3 on Mac OS X 14.5
[info] Apple M3 Max
[info] SQL Single INT Column Scan in Struct: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] -------------------------------------------------------------------------------------------------------------------------------------------
[info] SQL ORC MR 1703 1709 9 9.2 108.3 1.0X
[info] SQL ORC Vectorized (Nested Column Disabled) 1709 1719 14 9.2 108.7 1.0X
[info] SQL ORC Vectorized (Nested Column Enabled) 194 210 34 80.9 12.4 8.8X
[info] SQL Parquet MR: DataPageV1 1754 1773 27 9.0 111.5 1.0X
[info] SQL Parquet Vectorized: DataPageV1 (Nested Column Disabled) 1897 1899 2 8.3 120.6 0.9X
[info] SQL Parquet Vectorized: DataPageV1 (Nested Column Enabled) 61 73 9 258.6 3.9 28.0X
[info] SQL Parquet MR: DataPageV2 1692 1701 12 9.3 107.6 1.0X
[info] SQL Parquet Vectorized: DataPageV2 (Nested Column Disabled) 1854 1855 1 8.5 117.9 0.9X
[info] SQL Parquet Vectorized: DataPageV2 (Nested Column Enabled) 85 93 4 184.2 5.4 19.9X
[info] Running benchmark: SQL Single BIGINT Column Scan in Struct
[info] Running case: SQL ORC MR
[info] Stopped after 2 iterations, 3061 ms
[info] Running case: SQL ORC Vectorized (Nested Column Disabled)
[info] Stopped after 2 iterations, 3032 ms
[info] Running case: SQL ORC Vectorized (Nested Column Enabled)
[info] Stopped after 11 iterations, 2150 ms
[info] Running case: SQL Parquet MR: DataPageV1
[info] Stopped after 2 iterations, 3677 ms
[info] Running case: SQL Parquet Vectorized: DataPageV1 (Nested Column Disabled)
[info] Stopped after 2 iterations, 3977 ms
[info] Running case: SQL Parquet Vectorized: DataPageV1 (Nested Column Enabled)
[info] Stopped after 14 iterations, 2056 ms
[info] Running case: SQL Parquet MR: DataPageV2
[info] Stopped after 2 iterations, 3255 ms
[info] Running case: SQL Parquet Vectorized: DataPageV2 (Nested Column Disabled)
[info] Stopped after 2 iterations, 3508 ms
[info] Running case: SQL Parquet Vectorized: DataPageV2 (Nested Column Enabled)
[info] Stopped after 15 iterations, 2005 ms
[info] OpenJDK 64-Bit Server VM 21.0.3 on Mac OS X 14.5
[info] Apple M3 Max
[info] SQL Single BIGINT Column Scan in Struct: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] -------------------------------------------------------------------------------------------------------------------------------------------
[info] SQL ORC MR 1526 1531 6 10.3 97.0 1.0X
[info] SQL ORC Vectorized (Nested Column Disabled) 1515 1516 1 10.4 96.4 1.0X
[info] SQL ORC Vectorized (Nested Column Enabled) 190 195 5 82.8 12.1 8.0X
[info] SQL Parquet MR: DataPageV1 1821 1839 25 8.6 115.8 0.8X
[info] SQL Parquet Vectorized: DataPageV1 (Nested Column Disabled) 1969 1989 28 8.0 125.2 0.8X
[info] SQL Parquet Vectorized: DataPageV1 (Nested Column Enabled) 143 147 4 110.1 9.1 10.7X
[info] SQL Parquet MR: DataPageV2 1617 1628 15 9.7 102.8 0.9X
[info] SQL Parquet Vectorized: DataPageV2 (Nested Column Disabled) 1754 1754 1 9.0 111.5 0.9X
[info] SQL Parquet Vectorized: DataPageV2 (Nested Column Enabled) 120 134 10 130.6 7.7 12.7X
[info] Running benchmark: SQL Single FLOAT Column Scan in Struct
[info] Running case: SQL ORC MR
[info] Stopped after 2 iterations, 2902 ms
[info] Running case: SQL ORC Vectorized (Nested Column Disabled)
[info] Stopped after 2 iterations, 2906 ms
[info] Running case: SQL ORC Vectorized (Nested Column Enabled)
[info] Stopped after 11 iterations, 2140 ms
[info] Running case: SQL Parquet MR: DataPageV1
[info] Stopped after 2 iterations, 3582 ms
[info] Running case: SQL Parquet Vectorized: DataPageV1 (Nested Column Disabled)
[info] Stopped after 2 iterations, 3938 ms
[info] Running case: SQL Parquet Vectorized: DataPageV1 (Nested Column Enabled)
[info] Stopped after 28 iterations, 2025 ms
[info] Running case: SQL Parquet MR: DataPageV2
[info] Stopped after 2 iterations, 3295 ms
[info] Running case: SQL Parquet Vectorized: DataPageV2 (Nested Column Disabled)
[info] Stopped after 2 iterations, 3646 ms
[info] Running case: SQL Parquet Vectorized: DataPageV2 (Nested Column Enabled)
[info] Stopped after 28 iterations, 2001 ms
[info] OpenJDK 64-Bit Server VM 21.0.3 on Mac OS X 14.5
[info] Apple M3 Max
[info] SQL Single FLOAT Column Scan in Struct: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] -------------------------------------------------------------------------------------------------------------------------------------------
[info] SQL ORC MR 1443 1451 11 10.9 91.8 1.0X
[info] SQL ORC Vectorized (Nested Column Disabled) 1440 1453 19 10.9 91.5 1.0X
[info] SQL ORC Vectorized (Nested Column Enabled) 184 195 9 85.4 11.7 7.8X
[info] SQL Parquet MR: DataPageV1 1778 1791 19 8.8 113.0 0.8X
[info] SQL Parquet Vectorized: DataPageV1 (Nested Column Disabled) 1947 1969 31 8.1 123.8 0.7X
[info] SQL Parquet Vectorized: DataPageV1 (Nested Column Enabled) 63 72 5 248.4 4.0 22.8X
[info] SQL Parquet MR: DataPageV2 1648 1648 0 9.5 104.8 0.9X
[info] SQL Parquet Vectorized: DataPageV2 (Nested Column Disabled) 1820 1823 5 8.6 115.7 0.8X
[info] SQL Parquet Vectorized: DataPageV2 (Nested Column Enabled) 65 71 9 242.4 4.1 22.2X
[info] Running benchmark: SQL Single DOUBLE Column Scan in Struct
[info] Running case: SQL ORC MR
[info] Stopped after 2 iterations, 3537 ms
[info] Running case: SQL ORC Vectorized (Nested Column Disabled)
[info] Stopped after 2 iterations, 3609 ms
[info] Running case: SQL ORC Vectorized (Nested Column Enabled)
[info] Stopped after 6 iterations, 2320 ms
[info] Running case: SQL Parquet MR: DataPageV1
[info] Stopped after 2 iterations, 3647 ms
[info] Running case: SQL Parquet Vectorized: DataPageV1 (Nested Column Disabled)
[info] Stopped after 2 iterations, 3956 ms
[info] Running case: SQL Parquet Vectorized: DataPageV1 (Nested Column Enabled)
[info] Stopped after 14 iterations, 2071 ms
[info] Running case: SQL Parquet MR: DataPageV2
[info] Stopped after 2 iterations, 3463 ms
[info] Running case: SQL Parquet Vectorized: DataPageV2 (Nested Column Disabled)
[info] Stopped after 2 iterations, 3819 ms
[info] Running case: SQL Parquet Vectorized: DataPageV2 (Nested Column Enabled)
[info] Stopped after 14 iterations, 2092 ms
[info] OpenJDK 64-Bit Server VM 21.0.3 on Mac OS X 14.5
[info] Apple M3 Max
[info] SQL Single DOUBLE Column Scan in Struct: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] -------------------------------------------------------------------------------------------------------------------------------------------
[info] SQL ORC MR 1763 1769 8 8.9 112.1 1.0X
[info] SQL ORC Vectorized (Nested Column Disabled) 1791 1805 20 8.8 113.8 1.0X
[info] SQL ORC Vectorized (Nested Column Enabled) 377 387 10 41.7 24.0 4.7X
[info] SQL Parquet MR: DataPageV1 1809 1824 20 8.7 115.0 1.0X
[info] SQL Parquet Vectorized: DataPageV1 (Nested Column Disabled) 1977 1978 1 8.0 125.7 0.9X
[info] SQL Parquet Vectorized: DataPageV1 (Nested Column Enabled) 146 148 2 107.9 9.3 12.1X
[info] SQL Parquet MR: DataPageV2 1718 1732 19 9.2 109.2 1.0X
[info] SQL Parquet Vectorized: DataPageV2 (Nested Column Disabled) 1905 1910 7 8.3 121.1 0.9X
[info] SQL Parquet Vectorized: DataPageV2 (Nested Column Enabled) 146 149 3 107.5 9.3 12.1X
[info] Running benchmark: SQL Nested Column Scan
[info] Running case: SQL ORC MR
[info] Stopped after 10 iterations, 62690 ms
[info] Running case: SQL ORC Vectorized (Nested Column Disabled)
[info] Stopped after 10 iterations, 61859 ms
[info] Running case: SQL ORC Vectorized (Nested Column Enabled)
[info] Stopped after 10 iterations, 23893 ms
[info] Running case: SQL Parquet MR: DataPageV1
[info] Stopped after 10 iterations, 41404 ms
[info] Running case: SQL Parquet Vectorized: DataPageV1 (Nested Column Disabled)
[info] Stopped after 10 iterations, 43456 ms
[info] Running case: SQL Parquet Vectorized: DataPageV1 (Nested Column Enabled)
[info] Stopped after 10 iterations, 22753 ms
[info] Running case: SQL Parquet MR: DataPageV2
[info] Stopped after 10 iterations, 46835 ms
[info] Running case: SQL Parquet Vectorized: DataPageV2 (Nested Column Disabled)
[info] Stopped after 10 iterations, 49182 ms
[info] Running case: SQL Parquet Vectorized: DataPageV2 (Nested Column Enabled)
[info] Stopped after 10 iterations, 18211 ms
[info] OpenJDK 64-Bit Server VM 21.0.3 on Mac OS X 14.5
[info] Apple M3 Max
[info] SQL Nested Column Scan: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] -------------------------------------------------------------------------------------------------------------------------------------------
[info] SQL ORC MR 6146 6269 78 0.2 5861.1 1.0X
[info] SQL ORC Vectorized (Nested Column Disabled) 6024 6186 116 0.2 5745.0 1.0X
[info] SQL ORC Vectorized (Nested Column Enabled) 2363 2389 25 0.4 2253.7 2.6X
[info] SQL Parquet MR: DataPageV1 4106 4140 20 0.3 3916.2 1.5X
[info] SQL Parquet Vectorized: DataPageV1 (Nested Column Disabled) 4288 4346 41 0.2 4089.6 1.4X
[info] SQL Parquet Vectorized: DataPageV1 (Nested Column Enabled) 2131 2275 101 0.5 2032.0 2.9X
[info] SQL Parquet MR: DataPageV2 4636 4684 31 0.2 4421.4 1.3X
[info] SQL Parquet Vectorized: DataPageV2 (Nested Column Disabled) 4873 4918 34 0.2 4647.3 1.3X
[info] SQL Parquet Vectorized: DataPageV2 (Nested Column Enabled) 1795 1821 12 0.6 1711.7 3.4X
[info] Running benchmark: Int and String Scan
[info] Running case: SQL CSV
[info] Stopped after 2 iterations, 12344 ms
[info] Running case: SQL Json
[info] Stopped after 2 iterations, 9227 ms
[info] Running case: SQL Parquet Vectorized: DataPageV1
[info] Stopped after 3 iterations, 2641 ms
[info] Running case: SQL Parquet Vectorized: DataPageV2
[info] Stopped after 2 iterations, 2070 ms
[info] Running case: SQL Parquet MR: DataPageV1
[info] Stopped after 2 iterations, 5220 ms
[info] Running case: SQL Parquet MR: DataPageV2
[info] Stopped after 2 iterations, 4813 ms
[info] Running case: SQL ORC Vectorized
[info] Stopped after 2 iterations, 2185 ms
[info] Running case: SQL ORC MR
[info] Stopped after 2 iterations, 5517 ms
[info] OpenJDK 64-Bit Server VM 21.0.3 on Mac OS X 14.5
[info] Apple M3 Max
[info] Int and String Scan: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] SQL CSV 6150 6172 32 1.7 586.5 1.0X
[info] SQL Json 4610 4614 6 2.3 439.6 1.3X
[info] SQL Parquet Vectorized: DataPageV1 875 880 7 12.0 83.5 7.0X
[info] SQL Parquet Vectorized: DataPageV2 1027 1035 11 10.2 98.0 6.0X
[info] SQL Parquet MR: DataPageV1 2609 2610 2 4.0 248.8 2.4X
[info] SQL Parquet MR: DataPageV2 2406 2407 1 4.4 229.4 2.6X
[info] SQL ORC Vectorized 1092 1093 1 9.6 104.2 5.6X
[info] SQL ORC MR 2705 2759 75 3.9 258.0 2.3X
[info] Running benchmark: Repeated String
[info] Running case: SQL CSV
[info] Stopped after 2 iterations, 7169 ms
[info] Running case: SQL Json
[info] Stopped after 2 iterations, 5833 ms
[info] Running case: SQL Parquet Vectorized: DataPageV1
[info] Stopped after 6 iterations, 2338 ms
[info] Running case: SQL Parquet Vectorized: DataPageV2
[info] Stopped after 6 iterations, 2362 ms
[info] Running case: SQL Parquet MR: DataPageV1
[info] Stopped after 2 iterations, 2564 ms
[info] Running case: SQL Parquet MR: DataPageV2
[info] Stopped after 2 iterations, 2277 ms
[info] Running case: SQL ORC Vectorized
[info] Stopped after 8 iterations, 2058 ms
[info] Running case: SQL ORC MR
[info] Stopped after 2 iterations, 2217 ms
[info] OpenJDK 64-Bit Server VM 21.0.3 on Mac OS X 14.5
[info] Apple M3 Max
[info] Repeated String: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] SQL CSV 3579 3585 8 2.9 341.3 1.0X
[info] SQL Json 2904 2917 17 3.6 277.0 1.2X
[info] SQL Parquet Vectorized: DataPageV1 386 390 4 27.2 36.8 9.3X
[info] SQL Parquet Vectorized: DataPageV2 390 394 3 26.9 37.2 9.2X
[info] SQL Parquet MR: DataPageV1 1281 1282 2 8.2 122.2 2.8X
[info] SQL Parquet MR: DataPageV2 1127 1139 17 9.3 107.5 3.2X
[info] SQL ORC Vectorized 242 257 28 43.4 23.0 14.8X
[info] SQL ORC MR 1104 1109 6 9.5 105.3 3.2X
[info] Running benchmark: Partitioned Table
[info] Running case: Data column - CSV
[info] Stopped after 2 iterations, 13828 ms
[info] Running case: Data column - Json
[info] Stopped after 2 iterations, 8637 ms
[info] Running case: Data column - Parquet Vectorized: DataPageV1
[info] Stopped after 27 iterations, 2021 ms
[info] Running case: Data column - Parquet Vectorized: DataPageV2
[info] Stopped after 22 iterations, 2010 ms
[info] Running case: Data column - Parquet MR: DataPageV1
[info] Stopped after 2 iterations, 3474 ms
[info] Running case: Data column - Parquet MR: DataPageV2
[info] Stopped after 2 iterations, 3346 ms
[info] Running case: Data column - ORC Vectorized
[info] Stopped after 17 iterations, 2015 ms
[info] Running case: Data column - ORC MR
[info] Stopped after 2 iterations, 3251 ms
[info] Running case: Partition column - CSV
[info] Stopped after 2 iterations, 4098 ms
[info] Running case: Partition column - Json
[info] Stopped after 2 iterations, 7987 ms
[info] Running case: Partition column - Parquet Vectorized: DataPageV1
[info] Stopped after 71 iterations, 2004 ms
[info] Running case: Partition column - Parquet Vectorized: DataPageV2
[info] Stopped after 77 iterations, 2009 ms
[info] Running case: Partition column - Parquet MR: DataPageV1
[info] Stopped after 3 iterations, 2716 ms
[info] Running case: Partition column - Parquet MR: DataPageV2
[info] Stopped after 3 iterations, 2664 ms
[info] Running case: Partition column - ORC Vectorized
[info] Stopped after 71 iterations, 2019 ms
[info] Running case: Partition column - ORC MR
[info] Stopped after 2 iterations, 2034 ms
[info] Running case: Both columns - CSV
[info] Stopped after 2 iterations, 13219 ms
[info] Running case: Both columns - Json
[info] Stopped after 2 iterations, 8792 ms
[info] Running case: Both columns - Parquet Vectorized: DataPageV1
[info] Stopped after 27 iterations, 2022 ms
[info] Running case: Both columns - Parquet Vectorized: DataPageV2
[info] Stopped after 20 iterations, 2034 ms
[info] Running case: Both columns - Parquet MR: DataPageV1
[info] Stopped after 2 iterations, 3653 ms
[info] Running case: Both columns - Parquet MR: DataPageV2
[info] Stopped after 2 iterations, 3523 ms
[info] Running case: Both columns - ORC Vectorized
[info] Stopped after 15 iterations, 2070 ms
[info] Running case: Both columns - ORC MR
[info] Stopped after 2 iterations, 3394 ms
[info] OpenJDK 64-Bit Server VM 21.0.3 on Mac OS X 14.5
[info] Apple M3 Max
[info] Partitioned Table: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] ---------------------------------------------------------------------------------------------------------------------------------
[info] Data column - CSV 6861 6914 76 2.3 436.2 1.0X
[info] Data column - Json 4307 4319 17 3.7 273.8 1.6X
[info] Data column - Parquet Vectorized: DataPageV1 59 75 12 267.1 3.7 116.5X
[info] Data column - Parquet Vectorized: DataPageV2 82 91 7 190.8 5.2 83.2X
[info] Data column - Parquet MR: DataPageV1 1722 1737 21 9.1 109.5 4.0X
[info] Data column - Parquet MR: DataPageV2 1660 1673 19 9.5 105.5 4.1X
[info] Data column - ORC Vectorized 112 119 5 140.3 7.1 61.2X
[info] Data column - ORC MR 1575 1626 72 10.0 100.1 4.4X
[info] Partition column - CSV 2043 2049 9 7.7 129.9 3.4X
[info] Partition column - Json 3986 3994 10 3.9 253.4 1.7X
[info] Partition column - Parquet Vectorized: DataPageV1 24 28 5 668.3 1.5 291.5X
[info] Partition column - Parquet Vectorized: DataPageV2 23 26 3 697.1 1.4 304.1X
[info] Partition column - Parquet MR: DataPageV1 903 906 3 17.4 57.4 7.6X
[info] Partition column - Parquet MR: DataPageV2 860 888 29 18.3 54.7 8.0X
[info] Partition column - ORC Vectorized 25 28 3 640.0 1.6 279.2X
[info] Partition column - ORC MR 980 1017 53 16.1 62.3 7.0X
[info] Both columns - CSV 6606 6610 5 2.4 420.0 1.0X
[info] Both columns - Json 4383 4396 19 3.6 278.6 1.6X
[info] Both columns - Parquet Vectorized: DataPageV1 70 75 3 224.0 4.5 97.7X
[info] Both columns - Parquet Vectorized: DataPageV2 97 102 6 161.9 6.2 70.6X
[info] Both columns - Parquet MR: DataPageV1 1809 1827 25 8.7 115.0 3.8X
[info] Both columns - Parquet MR: DataPageV2 1735 1762 38 9.1 110.3 4.0X
[info] Both columns - ORC Vectorized 133 138 3 118.2 8.5 51.6X
[info] Both columns - ORC MR 1630 1697 95 9.7 103.6 4.2X
[info] Running benchmark: String with Nulls Scan (0.0%)
[info] Running case: SQL CSV
[info] Stopped after 2 iterations, 8947 ms
[info] Running case: SQL Json
[info] Stopped after 2 iterations, 7904 ms
[info] Running case: SQL Parquet Vectorized: DataPageV1
[info] Stopped after 4 iterations, 2146 ms
[info] Running case: SQL Parquet Vectorized: DataPageV2
[info] Stopped after 3 iterations, 2207 ms
[info] Running case: SQL Parquet MR: DataPageV1
[info] Stopped after 2 iterations, 4289 ms
[info] Running case: SQL Parquet MR: DataPageV2
[info] Stopped after 2 iterations, 4373 ms
[info] Running case: ParquetReader Vectorized: DataPageV1
[info] Stopped after 6 iterations, 2002 ms
[info] Running case: ParquetReader Vectorized: DataPageV2
[info] Stopped after 4 iterations, 2153 ms
[info] Running case: SQL ORC Vectorized
[info] Stopped after 5 iterations, 2367 ms
[info] Running case: SQL ORC MR
[info] Stopped after 2 iterations, 3630 ms
[info] OpenJDK 64-Bit Server VM 21.0.3 on Mac OS X 14.5
[info] Apple M3 Max
[info] String with Nulls Scan (0.0%): Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] SQL CSV 4370 4474 147 2.4 416.7 1.0X
[info] SQL Json 3939 3952 19 2.7 375.6 1.1X
[info] SQL Parquet Vectorized: DataPageV1 535 537 3 19.6 51.0 8.2X
[info] SQL Parquet Vectorized: DataPageV2 735 736 1 14.3 70.1 5.9X
[info] SQL Parquet MR: DataPageV1 2140 2145 7 4.9 204.0 2.0X
[info] SQL Parquet MR: DataPageV2 2179 2187 11 4.8 207.8 2.0X
[info] ParquetReader Vectorized: DataPageV1 332 334 2 31.6 31.6 13.2X
[info] ParquetReader Vectorized: DataPageV2 536 538 2 19.5 51.2 8.1X
[info] SQL ORC Vectorized 463 474 12 22.6 44.2 9.4X
[info] SQL ORC MR 1805 1815 15 5.8 172.1 2.4X
[info] Running benchmark: String with Nulls Scan (50.0%)
[info] Running case: SQL CSV
[info] Stopped after 2 iterations, 6799 ms
[info] Running case: SQL Json
[info] Stopped after 2 iterations, 7101 ms
[info] Running case: SQL Parquet Vectorized: DataPageV1
[info] Stopped after 5 iterations, 2189 ms
[info] Running case: SQL Parquet Vectorized: DataPageV2
[info] Stopped after 4 iterations, 2306 ms
[info] Running case: SQL Parquet MR: DataPageV1
[info] Stopped after 2 iterations, 3879 ms
[info] Running case: SQL Parquet MR: DataPageV2
[info] Stopped after 2 iterations, 4176 ms
[info] Running case: ParquetReader Vectorized: DataPageV1
[info] Stopped after 6 iterations, 2138 ms
[info] Running case: ParquetReader Vectorized: DataPageV2
[info] Stopped after 4 iterations, 2004 ms
[info] Running case: SQL ORC Vectorized
[info] Stopped after 4 iterations, 2553 ms
[info] Running case: SQL ORC MR
[info] Stopped after 2 iterations, 4107 ms
[info] OpenJDK 64-Bit Server VM 21.0.3 on Mac OS X 14.5
[info] Apple M3 Max
[info] String with Nulls Scan (50.0%): Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] SQL CSV 3387 3400 18 3.1 323.0 1.0X
[info] SQL Json 3524 3551 37 3.0 336.1 1.0X
[info] SQL Parquet Vectorized: DataPageV1 436 438 2 24.1 41.5 7.8X
[info] SQL Parquet Vectorized: DataPageV2 572 577 5 18.3 54.5 5.9X
[info] SQL Parquet MR: DataPageV1 1940 1940 0 5.4 185.0 1.7X
[info] SQL Parquet MR: DataPageV2 2087 2088 2 5.0 199.0 1.6X
[info] ParquetReader Vectorized: DataPageV1 355 356 1 29.6 33.8 9.5X
[info] ParquetReader Vectorized: DataPageV2 497 501 4 21.1 47.4 6.8X
[info] SQL ORC Vectorized 636 638 3 16.5 60.6 5.3X
[info] SQL ORC MR 2042 2054 17 5.1 194.8 1.7X
[info] Running benchmark: String with Nulls Scan (95.0%)
[info] Running case: SQL CSV
[info] Stopped after 2 iterations, 5083 ms
[info] Running case: SQL Json
[info] Stopped after 2 iterations, 5235 ms
[info] Running case: SQL Parquet Vectorized: DataPageV1
[info] Stopped after 20 iterations, 2067 ms
[info] Running case: SQL Parquet Vectorized: DataPageV2
[info] Stopped after 19 iterations, 2008 ms
[info] Running case: SQL Parquet MR: DataPageV1
[info] Stopped after 2 iterations, 2805 ms
[info] Running case: SQL Parquet MR: DataPageV2
[info] Stopped after 2 iterations, 2727 ms
[info] Running case: ParquetReader Vectorized: DataPageV1
[info] Stopped after 22 iterations, 2049 ms
[info] Running case: ParquetReader Vectorized: DataPageV2
[info] Stopped after 21 iterations, 2090 ms
[info] Running case: SQL ORC Vectorized
[info] Stopped after 11 iterations, 2135 ms
[info] Running case: SQL ORC MR
[info] Stopped after 2 iterations, 2567 ms
[info] OpenJDK 64-Bit Server VM 21.0.3 on Mac OS X 14.5
[info] Apple M3 Max
[info] String with Nulls Scan (95.0%): Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] SQL CSV 2526 2542 23 4.2 240.9 1.0X
[info] SQL Json 2610 2618 11 4.0 248.9 1.0X
[info] SQL Parquet Vectorized: DataPageV1 97 103 5 108.2 9.2 26.1X
[info] SQL Parquet Vectorized: DataPageV2 100 106 3 104.4 9.6 25.1X
[info] SQL Parquet MR: DataPageV1 1378 1403 35 7.6 131.4 1.8X
[info] SQL Parquet MR: DataPageV2 1363 1364 1 7.7 130.0 1.9X
[info] ParquetReader Vectorized: DataPageV1 91 93 2 115.2 8.7 27.8X
[info] ParquetReader Vectorized: DataPageV2 98 100 1 107.5 9.3 25.9X
[info] SQL ORC Vectorized 189 194 4 55.5 18.0 13.4X
[info] SQL ORC MR 1240 1284 62 8.5 118.2 2.0X
[info] Running benchmark: Single Column Scan from 10 columns
[info] Running case: SQL CSV
[info] Stopped after 3 iterations, 2377 ms
[info] Running case: SQL Json
[info] Stopped after 3 iterations, 2445 ms
[info] Running case: SQL Parquet Vectorized: DataPageV1
[info] Stopped after 80 iterations, 2003 ms
[info] Running case: SQL Parquet Vectorized: DataPageV2
[info] Stopped after 79 iterations, 2000 ms
[info] Running case: SQL Parquet MR: DataPageV1
[info] Stopped after 15 iterations, 2037 ms
[info] Running case: SQL Parquet MR: DataPageV2
[info] Stopped after 16 iterations, 2112 ms
[info] Running case: SQL ORC Vectorized
[info] Stopped after 74 iterations, 2001 ms
[info] Running case: SQL ORC MR
[info] Stopped after 17 iterations, 2025 ms
[info] OpenJDK 64-Bit Server VM 21.0.3 on Mac OS X 14.5
[info] Apple M3 Max
[info] Single Column Scan from 10 columns: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] SQL CSV 792 792 0 1.3 755.3 1.0X
[info] SQL Json 811 815 4 1.3 773.6 1.0X
[info] SQL Parquet Vectorized: DataPageV1 21 25 5 49.9 20.0 37.7X
[info] SQL Parquet Vectorized: DataPageV2 22 25 3 47.4 21.1 35.8X
[info] SQL Parquet MR: DataPageV1 133 136 2 7.9 127.0 5.9X
[info] SQL Parquet MR: DataPageV2 127 132 3 8.3 120.8 6.3X
[info] SQL ORC Vectorized 23 27 3 44.9 22.3 33.9X
[info] SQL ORC MR 105 119 5 10.0 99.9 7.6X
[info] 11:32:00.966 WARN org.apache.spark.sql.catalyst.util.SparkStringUtils: Truncated the string representation of a plan since it was too large. This behavior can be adjusted by setting 'spark.sql.debug.maxToStringFields'.
[info] Running benchmark: Single Column Scan from 50 columns
[info] Running case: SQL CSV
[info] Stopped after 2 iterations, 2741 ms
[info] Running case: SQL Json
[info] Stopped after 2 iterations, 5693 ms
[info] Running case: SQL Parquet Vectorized: DataPageV1
[info] Stopped after 73 iterations, 2012 ms
[info] Running case: SQL Parquet Vectorized: DataPageV2
[info] Stopped after 70 iterations, 2007 ms
[info] Running case: SQL Parquet MR: DataPageV1
[info] Stopped after 16 iterations, 2127 ms
[info] Running case: SQL Parquet MR: DataPageV2
[info] Stopped after 16 iterations, 2002 ms
[info] Running case: SQL ORC Vectorized
[info] Stopped after 62 iterations, 2021 ms
[info] Running case: SQL ORC MR
[info] Stopped after 16 iterations, 2092 ms
[info] OpenJDK 64-Bit Server VM 21.0.3 on Mac OS X 14.5
[info] Apple M3 Max
[info] Single Column Scan from 50 columns: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] SQL CSV 1356 1371 21 0.8 1292.8 1.0X
[info] SQL Json 2846 2847 1 0.4 2714.2 0.5X
[info] SQL Parquet Vectorized: DataPageV1 23 28 5 45.9 21.8 59.4X
[info] SQL Parquet Vectorized: DataPageV2 25 29 3 42.6 23.5 55.1X
[info] SQL Parquet MR: DataPageV1 129 133 3 8.1 123.0 10.5X
[info] SQL Parquet MR: DataPageV2 115 125 5 9.1 110.1 11.7X
[info] SQL ORC Vectorized 27 33 6 38.3 26.1 49.6X
[info] SQL ORC MR 127 131 2 8.3 120.7 10.7X
[info] Running benchmark: Single Column Scan from 100 columns
[info] Running case: SQL CSV
[info] Stopped after 2 iterations, 4301 ms
[info] Running case: SQL Json
[info] Stopped after 2 iterations, 11071 ms
[info] Running case: SQL Parquet Vectorized: DataPageV1
[info] Stopped after 51 iterations, 2029 ms
[info] Running case: SQL Parquet Vectorized: DataPageV2
[info] Stopped after 61 iterations, 2029 ms
[info] Running case: SQL Parquet MR: DataPageV1
[info] Stopped after 15 iterations, 2040 ms
[info] Running case: SQL Parquet MR: DataPageV2
[info] Stopped after 16 iterations, 2005 ms
[info] Running case: SQL ORC Vectorized
[info] Stopped after 54 iterations, 2028 ms
[info] Running case: SQL ORC MR
[info] Stopped after 15 iterations, 2108 ms
[info] OpenJDK 64-Bit Server VM 21.0.3 on Mac OS X 14.5
[info] Apple M3 Max
[info] Single Column Scan from 100 columns: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] SQL CSV 2140 2151 16 0.5 2040.4 1.0X
[info] SQL Json 5309 5536 320 0.2 5063.2 0.4X
[info] SQL Parquet Vectorized: DataPageV1 31 40 13 34.4 29.1 70.1X
[info] SQL Parquet Vectorized: DataPageV2 30 33 3 35.5 28.2 72.5X
[info] SQL Parquet MR: DataPageV1 127 136 5 8.2 121.4 16.8X
[info] SQL Parquet MR: DataPageV2 121 125 4 8.7 115.0 17.7X
[info] SQL ORC Vectorized 33 38 3 31.8 31.5 64.8X
[info] SQL ORC MR 138 141 3 7.6 131.3 15.5X
[success] Total time: 2498 s (41:38), completed Jul 1, 2024, 11:35:08 AM
BuiltInDataSourceWriteBenchmark
[info] running (fork) org.apache.spark.sql.execution.benchmark.BuiltInDataSourceWriteBenchmark parquet
[error] WARNING: Using incubator modules: jdk.incubator.vector
[info] 11:37:49.340 WARN org.apache.hadoop.util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
[info] Running benchmark: parquet writer benchmark
[info] Running case: Output Single Int Column
[info] Stopped after 2 iterations, 2962 ms
[info] Running case: Output Single Double Column
[info] Stopped after 2 iterations, 3030 ms
[info] Running case: Output Int and String Column
[info] Stopped after 2 iterations, 5383 ms
[info] Running case: Output Partitions
[info] Stopped after 2 iterations, 4587 ms
[info] Running case: Output Buckets
[info] Stopped after 2 iterations, 5951 ms
[info] OpenJDK 64-Bit Server VM 21.0.3 on Mac OS X 14.5
[info] Apple M3 Max
[info] parquet writer benchmark: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] Output Single Int Column 1405 1481 107 11.2 89.4 1.0X
[info] Output Single Double Column 1507 1515 12 10.4 95.8 0.9X
[info] Output Int and String Column 2689 2692 4 5.8 171.0 0.5X
[info] Output Partitions 2289 2294 6 6.9 145.6 0.6X
[info] Output Buckets 2973 2976 4 5.3 189.0 0.5X
[success] Total time: 108 s (01:48), completed Jul 1, 2024, 11:38:28 AM
Main branch
DataSourceReadBenchmark
[info] running (fork) org.apache.spark.sql.execution.benchmark.DataSourceReadBenchmark
[error] WARNING: Using incubator modules: jdk.incubator.vector
[info] 11:40:13.033 WARN org.apache.hadoop.util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
[info] Running benchmark: SQL Single BOOLEAN Column Scan
[info] Running case: SQL CSV
[info] Stopped after 2 iterations, 11283 ms
[info] Running case: SQL Json
[info] Stopped after 2 iterations, 6570 ms
[info] Running case: SQL Parquet Vectorized: DataPageV1
[info] Stopped after 42 iterations, 2031 ms
[info] Running case: SQL Parquet Vectorized: DataPageV2
[info] Stopped after 42 iterations, 2004 ms
[info] Running case: SQL Parquet MR: DataPageV1
[info] Stopped after 2 iterations, 2693 ms
[info] Running case: SQL Parquet MR: DataPageV2
[info] Stopped after 2 iterations, 2559 ms
[info] Running case: SQL ORC Vectorized
[info] Stopped after 37 iterations, 2031 ms
[info] Running case: SQL ORC MR
[info] Stopped after 2 iterations, 2971 ms
[info] OpenJDK 64-Bit Server VM 21.0.3 on Mac OS X 14.5
[info] Apple M3 Max
[info] SQL Single BOOLEAN Column Scan: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] SQL CSV 5637 5642 7 2.8 358.4 1.0X
[info] SQL Json 3237 3285 69 4.9 205.8 1.7X
[info] SQL Parquet Vectorized: DataPageV1 40 48 8 390.8 2.6 140.1X
[info] SQL Parquet Vectorized: DataPageV2 42 48 5 371.3 2.7 133.1X
[info] SQL Parquet MR: DataPageV1 1316 1347 43 11.9 83.7 4.3X
[info] SQL Parquet MR: DataPageV2 1269 1280 15 12.4 80.7 4.4X
[info] SQL ORC Vectorized 50 55 3 312.1 3.2 111.8X
[info] SQL ORC MR 1465 1486 28 10.7 93.2 3.8X
[info] Running benchmark: Parquet Reader Single BOOLEAN Column Scan
[info] Running case: ParquetReader Vectorized: DataPageV1
[info] Stopped after 50 iterations, 2017 ms
[info] Running case: ParquetReader Vectorized: DataPageV2
[info] Stopped after 43 iterations, 2024 ms
[info] Running case: ParquetReader Vectorized -> Row: DataPageV1
[info] Stopped after 110 iterations, 2010 ms
[info] Running case: ParquetReader Vectorized -> Row: DataPageV2
[info] Stopped after 77 iterations, 2022 ms
[info] OpenJDK 64-Bit Server VM 21.0.3 on Mac OS X 14.5
[info] Apple M3 Max
[info] Parquet Reader Single BOOLEAN Column Scan: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] ---------------------------------------------------------------------------------------------------------------------------
[info] ParquetReader Vectorized: DataPageV1 35 40 3 443.8 2.3 1.0X
[info] ParquetReader Vectorized: DataPageV2 41 47 3 383.4 2.6 0.9X
[info] ParquetReader Vectorized -> Row: DataPageV1 17 18 1 924.0 1.1 2.1X
[info] ParquetReader Vectorized -> Row: DataPageV2 25 26 1 641.3 1.6 1.4X
[info] Running benchmark: SQL Single TINYINT Column Scan
[info] Running case: SQL CSV
[info] Stopped after 2 iterations, 11597 ms
[info] Running case: SQL Json
[info] Stopped after 2 iterations, 7678 ms
[info] Running case: SQL Parquet Vectorized: DataPageV1
[info] Stopped after 34 iterations, 2008 ms
[info] Running case: SQL Parquet Vectorized: DataPageV2
[info] Stopped after 36 iterations, 2109 ms
[info] Running case: SQL Parquet MR: DataPageV1
[info] Stopped after 2 iterations, 3264 ms
[info] Running case: SQL Parquet MR: DataPageV2
[info] Stopped after 2 iterations, 2792 ms
[info] Running case: SQL ORC Vectorized
[info] Stopped after 34 iterations, 2065 ms
[info] Running case: SQL ORC MR
[info] Stopped after 2 iterations, 2931 ms
[info] OpenJDK 64-Bit Server VM 21.0.3 on Mac OS X 14.5
[info] Apple M3 Max
[info] SQL Single TINYINT Column Scan: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] SQL CSV 5622 5799 250 2.8 357.4 1.0X
[info] SQL Json 3820 3839 27 4.1 242.9 1.5X
[info] SQL Parquet Vectorized: DataPageV1 53 59 5 297.8 3.4 106.5X
[info] SQL Parquet Vectorized: DataPageV2 51 59 10 311.3 3.2 111.3X
[info] SQL Parquet MR: DataPageV1 1626 1632 9 9.7 103.4 3.5X
[info] SQL Parquet MR: DataPageV2 1379 1396 25 11.4 87.7 4.1X
[info] SQL ORC Vectorized 57 61 3 276.5 3.6 98.8X
[info] SQL ORC MR 1360 1466 150 11.6 86.5 4.1X
[info] Running benchmark: Parquet Reader Single TINYINT Column Scan
[info] Running case: ParquetReader Vectorized: DataPageV1
[info] Stopped after 47 iterations, 2035 ms
[info] Running case: ParquetReader Vectorized: DataPageV2
[info] Stopped after 48 iterations, 2021 ms
[info] Running case: ParquetReader Vectorized -> Row: DataPageV1
[info] Stopped after 78 iterations, 2007 ms
[info] Running case: ParquetReader Vectorized -> Row: DataPageV2
[info] Stopped after 77 iterations, 2017 ms
[info] OpenJDK 64-Bit Server VM 21.0.3 on Mac OS X 14.5
[info] Apple M3 Max
[info] Parquet Reader Single TINYINT Column Scan: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] ---------------------------------------------------------------------------------------------------------------------------
[info] ParquetReader Vectorized: DataPageV1 41 43 2 386.8 2.6 1.0X
[info] ParquetReader Vectorized: DataPageV2 40 42 1 388.7 2.6 1.0X
[info] ParquetReader Vectorized -> Row: DataPageV1 24 26 1 644.9 1.6 1.7X
[info] ParquetReader Vectorized -> Row: DataPageV2 25 26 1 640.3 1.6 1.7X
[info] Running benchmark: SQL Single SMALLINT Column Scan
[info] Running case: SQL CSV
[info] Stopped after 2 iterations, 12016 ms
[info] Running case: SQL Json
[info] Stopped after 2 iterations, 8230 ms
[info] Running case: SQL Parquet Vectorized: DataPageV1
[info] Stopped after 27 iterations, 2061 ms
[info] Running case: SQL Parquet Vectorized: DataPageV2
[info] Stopped after 27 iterations, 2065 ms
[info] Running case: SQL Parquet MR: DataPageV1
[info] Stopped after 2 iterations, 3281 ms
[info] Running case: SQL Parquet MR: DataPageV2
[info] Stopped after 2 iterations, 2944 ms
[info] Running case: SQL ORC Vectorized
[info] Stopped after 24 iterations, 2072 ms
[info] Running case: SQL ORC MR
[info] Stopped after 2 iterations, 3375 ms
[info] OpenJDK 64-Bit Server VM 21.0.3 on Mac OS X 14.5
[info] Apple M3 Max
[info] SQL Single SMALLINT Column Scan: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] SQL CSV 5939 6008 98 2.6 377.6 1.0X
[info] SQL Json 4103 4115 17 3.8 260.9 1.4X
[info] SQL Parquet Vectorized: DataPageV1 62 76 18 255.0 3.9 96.3X
[info] SQL Parquet Vectorized: DataPageV2 69 77 10 229.1 4.4 86.5X
[info] SQL Parquet MR: DataPageV1 1610 1641 44 9.8 102.3 3.7X
[info] SQL Parquet MR: DataPageV2 1451 1472 31 10.8 92.2 4.1X
[info] SQL ORC Vectorized 82 86 4 191.3 5.2 72.2X
[info] SQL ORC MR 1682 1688 8 9.4 106.9 3.5X
[info] Running benchmark: Parquet Reader Single SMALLINT Column Scan
[info] Running case: ParquetReader Vectorized: DataPageV1
[info] Stopped after 20 iterations, 2022 ms
[info] Running case: ParquetReader Vectorized: DataPageV2
[info] Stopped after 19 iterations, 2034 ms
[info] Running case: ParquetReader Vectorized -> Row: DataPageV1
[info] Stopped after 25 iterations, 2059 ms
[info] Running case: ParquetReader Vectorized -> Row: DataPageV2
[info] Stopped after 22 iterations, 2030 ms
[info] OpenJDK 64-Bit Server VM 21.0.3 on Mac OS X 14.5
[info] Apple M3 Max
[info] Parquet Reader Single SMALLINT Column Scan: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] ---------------------------------------------------------------------------------------------------------------------------
[info] ParquetReader Vectorized: DataPageV1 98 101 1 160.1 6.2 1.0X
[info] ParquetReader Vectorized: DataPageV2 106 107 1 149.0 6.7 0.9X
[info] ParquetReader Vectorized -> Row: DataPageV1 80 82 7 196.5 5.1 1.2X
[info] ParquetReader Vectorized -> Row: DataPageV2 91 92 1 172.6 5.8 1.1X
[info] Running benchmark: SQL Single INT Column Scan
[info] Running case: SQL CSV
[info] Stopped after 2 iterations, 12657 ms
[info] Running case: SQL Json
[info] Stopped after 2 iterations, 8705 ms
[info] Running case: SQL Parquet Vectorized: DataPageV1
[info] Stopped after 28 iterations, 2025 ms
[info] Running case: SQL Parquet Vectorized: DataPageV2
[info] Stopped after 18 iterations, 2034 ms
[info] Running case: SQL Parquet MR: DataPageV1
[info] Stopped after 2 iterations, 2939 ms
[info] Running case: SQL Parquet MR: DataPageV2
[info] Stopped after 2 iterations, 2783 ms
[info] Running case: SQL ORC Vectorized
[info] Stopped after 18 iterations, 2097 ms
[info] Running case: SQL ORC MR
[info] Stopped after 2 iterations, 3103 ms
[info] OpenJDK 64-Bit Server VM 21.0.3 on Mac OS X 14.5
[info] Apple M3 Max
[info] SQL Single INT Column Scan: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] SQL CSV 6310 6329 26 2.5 401.2 1.0X
[info] SQL Json 4331 4353 30 3.6 275.4 1.5X
[info] SQL Parquet Vectorized: DataPageV1 60 72 13 260.7 3.8 104.6X
[info] SQL Parquet Vectorized: DataPageV2 105 113 11 149.2 6.7 59.9X
[info] SQL Parquet MR: DataPageV1 1463 1470 9 10.8 93.0 4.3X
[info] SQL Parquet MR: DataPageV2 1379 1392 17 11.4 87.7 4.6X
[info] SQL ORC Vectorized 108 117 8 145.7 6.9 58.5X
[info] SQL ORC MR 1524 1552 38 10.3 96.9 4.1X
[info] Running benchmark: Parquet Reader Single INT Column Scan
[info] Running case: ParquetReader Vectorized: DataPageV1
[info] Stopped after 22 iterations, 2056 ms
[info] Running case: ParquetReader Vectorized: DataPageV2
[info] Stopped after 17 iterations, 2077 ms
[info] Running case: ParquetReader Vectorized -> Row: DataPageV1
[info] Stopped after 26 iterations, 2025 ms
[info] Running case: ParquetReader Vectorized -> Row: DataPageV2
[info] Stopped after 20 iterations, 2075 ms
[info] OpenJDK 64-Bit Server VM 21.0.3 on Mac OS X 14.5
[info] Apple M3 Max
[info] Parquet Reader Single INT Column Scan: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] ---------------------------------------------------------------------------------------------------------------------------
[info] ParquetReader Vectorized: DataPageV1 90 93 5 175.7 5.7 1.0X
[info] ParquetReader Vectorized: DataPageV2 119 122 4 132.5 7.5 0.8X
[info] ParquetReader Vectorized -> Row: DataPageV1 77 78 1 205.3 4.9 1.2X
[info] ParquetReader Vectorized -> Row: DataPageV2 102 104 1 153.5 6.5 0.9X
[info] Running benchmark: SQL Single BIGINT Column Scan
[info] Running case: SQL CSV
[info] Stopped after 2 iterations, 13403 ms
[info] Running case: SQL Json
[info] Stopped after 2 iterations, 8273 ms
[info] Running case: SQL Parquet Vectorized: DataPageV1
[info] Stopped after 14 iterations, 2133 ms
[info] Running case: SQL Parquet Vectorized: DataPageV2
[info] Stopped after 16 iterations, 2012 ms
[info] Running case: SQL Parquet MR: DataPageV1
[info] Stopped after 2 iterations, 3322 ms
[info] Running case: SQL Parquet MR: DataPageV2
[info] Stopped after 2 iterations, 3042 ms
[info] Running case: SQL ORC Vectorized
[info] Stopped after 18 iterations, 2083 ms
[info] Running case: SQL ORC MR
[info] Stopped after 2 iterations, 2902 ms
[info] OpenJDK 64-Bit Server VM 21.0.3 on Mac OS X 14.5
[info] Apple M3 Max
[info] SQL Single BIGINT Column Scan: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] SQL CSV 6564 6702 194 2.4 417.4 1.0X
[info] SQL Json 4125 4137 16 3.8 262.3 1.6X
[info] SQL Parquet Vectorized: DataPageV1 144 152 8 109.1 9.2 45.6X
[info] SQL Parquet Vectorized: DataPageV2 119 126 5 132.6 7.5 55.3X
[info] SQL Parquet MR: DataPageV1 1638 1661 33 9.6 104.1 4.0X
[info] SQL Parquet MR: DataPageV2 1517 1521 7 10.4 96.4 4.3X
[info] SQL ORC Vectorized 110 116 5 143.0 7.0 59.7X
[info] SQL ORC MR 1435 1451 24 11.0 91.2 4.6X
[info] Running benchmark: Parquet Reader Single BIGINT Column Scan
[info] Running case: ParquetReader Vectorized: DataPageV1
[info] Stopped after 12 iterations, 2049 ms
[info] Running case: ParquetReader Vectorized: DataPageV2
[info] Stopped after 13 iterations, 2021 ms
[info] Running case: ParquetReader Vectorized -> Row: DataPageV1
[info] Stopped after 13 iterations, 2013 ms
[info] Running case: ParquetReader Vectorized -> Row: DataPageV2
[info] Stopped after 15 iterations, 2046 ms
[info] OpenJDK 64-Bit Server VM 21.0.3 on Mac OS X 14.5
[info] Apple M3 Max
[info] Parquet Reader Single BIGINT Column Scan: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] ---------------------------------------------------------------------------------------------------------------------------
[info] ParquetReader Vectorized: DataPageV1 168 171 2 93.7 10.7 1.0X
[info] ParquetReader Vectorized: DataPageV2 152 156 4 103.2 9.7 1.1X
[info] ParquetReader Vectorized -> Row: DataPageV1 152 155 2 103.6 9.7 1.1X
[info] ParquetReader Vectorized -> Row: DataPageV2 135 136 1 116.5 8.6 1.2X
[info] Running benchmark: SQL Single FLOAT Column Scan
[info] Running case: SQL CSV
[info] Stopped after 2 iterations, 13299 ms
[info] Running case: SQL Json
[info] Stopped after 2 iterations, 9999 ms
[info] Running case: SQL Parquet Vectorized: DataPageV1
[info] Stopped after 27 iterations, 2023 ms
[info] Running case: SQL Parquet Vectorized: DataPageV2
[info] Stopped after 31 iterations, 2062 ms
[info] Running case: SQL Parquet MR: DataPageV1
[info] Stopped after 2 iterations, 3360 ms
[info] Running case: SQL Parquet MR: DataPageV2
[info] Stopped after 2 iterations, 3398 ms
[info] Running case: SQL ORC Vectorized
[info] Stopped after 17 iterations, 2014 ms
[info] Running case: SQL ORC MR
[info] Stopped after 2 iterations, 3162 ms
[info] OpenJDK 64-Bit Server VM 21.0.3 on Mac OS X 14.5
[info] Apple M3 Max
[info] SQL Single FLOAT Column Scan: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] SQL CSV 6619 6650 44 2.4 420.8 1.0X
[info] SQL Json 4968 5000 45 3.2 315.9 1.3X
[info] SQL Parquet Vectorized: DataPageV1 64 75 14 244.9 4.1 103.0X
[info] SQL Parquet Vectorized: DataPageV2 62 67 3 252.4 4.0 106.2X
[info] SQL Parquet MR: DataPageV1 1646 1680 48 9.6 104.6 4.0X
[info] SQL Parquet MR: DataPageV2 1670 1699 41 9.4 106.2 4.0X
[info] SQL ORC Vectorized 114 119 4 137.9 7.3 58.0X
[info] SQL ORC MR 1571 1581 15 10.0 99.9 4.2X
[info] Running benchmark: Parquet Reader Single FLOAT Column Scan
[info] Running case: ParquetReader Vectorized: DataPageV1
[info] Stopped after 23 iterations, 2058 ms
[info] Running case: ParquetReader Vectorized: DataPageV2
[info] Stopped after 23 iterations, 2089 ms
[info] Running case: ParquetReader Vectorized -> Row: DataPageV1
[info] Stopped after 26 iterations, 2030 ms
[info] Running case: ParquetReader Vectorized -> Row: DataPageV2
[info] Stopped after 26 iterations, 2024 ms
[info] OpenJDK 64-Bit Server VM 21.0.3 on Mac OS X 14.5
[info] Apple M3 Max
[info] Parquet Reader Single FLOAT Column Scan: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] ---------------------------------------------------------------------------------------------------------------------------
[info] ParquetReader Vectorized: DataPageV1 86 90 3 182.9 5.5 1.0X
[info] ParquetReader Vectorized: DataPageV2 88 91 1 178.2 5.6 1.0X
[info] ParquetReader Vectorized -> Row: DataPageV1 77 78 1 203.5 4.9 1.1X
[info] ParquetReader Vectorized -> Row: DataPageV2 77 78 1 204.2 4.9 1.1X
[info] Running benchmark: SQL Single DOUBLE Column Scan
[info] Running case: SQL CSV
[info] Stopped after 2 iterations, 13141 ms
[info] Running case: SQL Json
[info] Stopped after 2 iterations, 10028 ms
[info] Running case: SQL Parquet Vectorized: DataPageV1
[info] Stopped after 13 iterations, 2096 ms
[info] Running case: SQL Parquet Vectorized: DataPageV2
[info] Stopped after 14 iterations, 2121 ms
[info] Running case: SQL Parquet MR: DataPageV1
[info] Stopped after 2 iterations, 3624 ms
[info] Running case: SQL Parquet MR: DataPageV2
[info] Stopped after 2 iterations, 3625 ms
[info] Running case: SQL ORC Vectorized
[info] Stopped after 7 iterations, 2205 ms
[info] Running case: SQL ORC MR
[info] Stopped after 2 iterations, 3786 ms
[info] OpenJDK 64-Bit Server VM 21.0.3 on Mac OS X 14.5
[info] Apple M3 Max
[info] SQL Single DOUBLE Column Scan: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] SQL CSV 6562 6571 13 2.4 417.2 1.0X
[info] SQL Json 4994 5014 29 3.1 317.5 1.3X
[info] SQL Parquet Vectorized: DataPageV1 146 161 23 107.7 9.3 44.9X
[info] SQL Parquet Vectorized: DataPageV2 142 152 15 110.5 9.0 46.1X
[info] SQL Parquet MR: DataPageV1 1789 1812 32 8.8 113.8 3.7X
[info] SQL Parquet MR: DataPageV2 1806 1813 10 8.7 114.8 3.6X
[info] SQL ORC Vectorized 312 315 4 50.5 19.8 21.1X
[info] SQL ORC MR 1871 1893 32 8.4 118.9 3.5X
[info] Running benchmark: Parquet Reader Single DOUBLE Column Scan
[info] Running case: ParquetReader Vectorized: DataPageV1
[info] Stopped after 13 iterations, 2165 ms
[info] Running case: ParquetReader Vectorized: DataPageV2
[info] Stopped after 12 iterations, 2010 ms
[info] Running case: ParquetReader Vectorized -> Row: DataPageV1
[info] Stopped after 13 iterations, 2007 ms
[info] Running case: ParquetReader Vectorized -> Row: DataPageV2
[info] Stopped after 13 iterations, 2002 ms
[info] OpenJDK 64-Bit Server VM 21.0.3 on Mac OS X 14.5
[info] Apple M3 Max
[info] Parquet Reader Single DOUBLE Column Scan: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] ---------------------------------------------------------------------------------------------------------------------------
[info] ParquetReader Vectorized: DataPageV1 164 167 2 95.8 10.4 1.0X
[info] ParquetReader Vectorized: DataPageV2 166 168 1 94.6 10.6 1.0X
[info] ParquetReader Vectorized -> Row: DataPageV1 153 154 1 102.6 9.7 1.1X
[info] ParquetReader Vectorized -> Row: DataPageV2 153 154 1 102.9 9.7 1.1X
[info] Running benchmark: SQL Single TINYINT Column Scan in Struct
[info] Running case: SQL ORC MR
[info] Stopped after 2 iterations, 3385 ms
[info] Running case: SQL ORC Vectorized (Nested Column Disabled)
[info] Stopped after 2 iterations, 3170 ms
[info] Running case: SQL ORC Vectorized (Nested Column Enabled)
[info] Stopped after 26 iterations, 2065 ms
[info] Running case: SQL Parquet MR: DataPageV1
[info] Stopped after 2 iterations, 3339 ms
[info] Running case: SQL Parquet Vectorized: DataPageV1 (Nested Column Disabled)
[info] Stopped after 2 iterations, 3737 ms
[info] Running case: SQL Parquet Vectorized: DataPageV1 (Nested Column Enabled)
[info] Stopped after 29 iterations, 2048 ms
[info] Running case: SQL Parquet MR: DataPageV2
[info] Stopped after 2 iterations, 3278 ms
[info] Running case: SQL Parquet Vectorized: DataPageV2 (Nested Column Disabled)
[info] Stopped after 2 iterations, 3460 ms
[info] Running case: SQL Parquet Vectorized: DataPageV2 (Nested Column Enabled)
[info] Stopped after 32 iterations, 2048 ms
[info] OpenJDK 64-Bit Server VM 21.0.3 on Mac OS X 14.5
[info] Apple M3 Max
[info] SQL Single TINYINT Column Scan in Struct: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] -------------------------------------------------------------------------------------------------------------------------------------------
[info] SQL ORC MR 1662 1693 43 9.5 105.7 1.0X
[info] SQL ORC Vectorized (Nested Column Disabled) 1540 1585 64 10.2 97.9 1.1X
[info] SQL ORC Vectorized (Nested Column Enabled) 68 79 15 230.3 4.3 24.3X
[info] SQL Parquet MR: DataPageV1 1664 1670 8 9.5 105.8 1.0X
[info] SQL Parquet Vectorized: DataPageV1 (Nested Column Disabled) 1840 1869 41 8.5 117.0 0.9X
[info] SQL Parquet Vectorized: DataPageV1 (Nested Column Enabled) 66 71 4 239.3 4.2 25.3X
[info] SQL Parquet MR: DataPageV2 1633 1639 9 9.6 103.8 1.0X
[info] SQL Parquet Vectorized: DataPageV2 (Nested Column Disabled) 1723 1730 11 9.1 109.5 1.0X
[info] SQL Parquet Vectorized: DataPageV2 (Nested Column Enabled) 57 64 5 274.6 3.6 29.0X
[info] Running benchmark: SQL Single SMALLINT Column Scan in Struct
[info] Running case: SQL ORC MR
[info] Stopped after 2 iterations, 3679 ms
[info] Running case: SQL ORC Vectorized (Nested Column Disabled)
[info] Stopped after 2 iterations, 3897 ms
[info] Running case: SQL ORC Vectorized (Nested Column Enabled)
[info] Stopped after 12 iterations, 2041 ms
[info] Running case: SQL Parquet MR: DataPageV1
[info] Stopped after 2 iterations, 3546 ms
[info] Running case: SQL Parquet Vectorized: DataPageV1 (Nested Column Disabled)
[info] Stopped after 2 iterations, 3925 ms
[info] Running case: SQL Parquet Vectorized: DataPageV1 (Nested Column Enabled)
[info] Stopped after 29 iterations, 2002 ms
[info] Running case: SQL Parquet MR: DataPageV2
[info] Stopped after 2 iterations, 3401 ms
[info] Running case: SQL Parquet Vectorized: DataPageV2 (Nested Column Disabled)
[info] Stopped after 2 iterations, 3733 ms
[info] Running case: SQL Parquet Vectorized: DataPageV2 (Nested Column Enabled)
[info] Stopped after 18 iterations, 2032 ms
[info] OpenJDK 64-Bit Server VM 21.0.3 on Mac OS X 14.5
[info] Apple M3 Max
[info] SQL Single SMALLINT Column Scan in Struct: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] -------------------------------------------------------------------------------------------------------------------------------------------
[info] SQL ORC MR 1807 1840 46 8.7 114.9 1.0X
[info] SQL ORC Vectorized (Nested Column Disabled) 1941 1949 11 8.1 123.4 0.9X
[info] SQL ORC Vectorized (Nested Column Enabled) 163 170 7 96.5 10.4 11.1X
[info] SQL Parquet MR: DataPageV1 1762 1773 15 8.9 112.1 1.0X
[info] SQL Parquet Vectorized: DataPageV1 (Nested Column Disabled) 1958 1963 7 8.0 124.5 0.9X
[info] SQL Parquet Vectorized: DataPageV1 (Nested Column Enabled) 60 69 5 261.0 3.8 30.0X
[info] SQL Parquet MR: DataPageV2 1694 1701 10 9.3 107.7 1.1X
[info] SQL Parquet Vectorized: DataPageV2 (Nested Column Disabled) 1861 1867 7 8.5 118.3 1.0X
[info] SQL Parquet Vectorized: DataPageV2 (Nested Column Enabled) 110 113 2 143.5 7.0 16.5X
[info] Running benchmark: SQL Single INT Column Scan in Struct
[info] Running case: SQL ORC MR
[info] Stopped after 2 iterations, 3094 ms
[info] Running case: SQL ORC Vectorized (Nested Column Disabled)
[info] Stopped after 2 iterations, 3060 ms
[info] Running case: SQL ORC Vectorized (Nested Column Enabled)
[info] Stopped after 11 iterations, 2141 ms
[info] Running case: SQL Parquet MR: DataPageV1
[info] Stopped after 2 iterations, 3591 ms
[info] Running case: SQL Parquet Vectorized: DataPageV1 (Nested Column Disabled)
[info] Stopped after 2 iterations, 3845 ms
[info] Running case: SQL Parquet Vectorized: DataPageV1 (Nested Column Enabled)
[info] Stopped after 29 iterations, 2031 ms
[info] Running case: SQL Parquet MR: DataPageV2
[info] Stopped after 2 iterations, 3360 ms
[info] Running case: SQL Parquet Vectorized: DataPageV2 (Nested Column Disabled)
[info] Stopped after 2 iterations, 3654 ms
[info] Running case: SQL Parquet Vectorized: DataPageV2 (Nested Column Enabled)
[info] Stopped after 17 iterations, 2078 ms
[info] OpenJDK 64-Bit Server VM 21.0.3 on Mac OS X 14.5
[info] Apple M3 Max
[info] SQL Single INT Column Scan in Struct: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] -------------------------------------------------------------------------------------------------------------------------------------------
[info] SQL ORC MR 1482 1547 92 10.6 94.2 1.0X
[info] SQL ORC Vectorized (Nested Column Disabled) 1528 1530 4 10.3 97.1 1.0X
[info] SQL ORC Vectorized (Nested Column Enabled) 189 195 7 83.4 12.0 7.9X
[info] SQL Parquet MR: DataPageV1 1773 1796 32 8.9 112.7 0.8X
[info] SQL Parquet Vectorized: DataPageV1 (Nested Column Disabled) 1906 1923 23 8.3 121.2 0.8X
[info] SQL Parquet Vectorized: DataPageV1 (Nested Column Enabled) 60 70 7 263.1 3.8 24.8X
[info] SQL Parquet MR: DataPageV2 1664 1680 23 9.5 105.8 0.9X
[info] SQL Parquet Vectorized: DataPageV2 (Nested Column Disabled) 1821 1827 9 8.6 115.8 0.8X
[info] SQL Parquet Vectorized: DataPageV2 (Nested Column Enabled) 116 122 4 135.2 7.4 12.7X
[info] Running benchmark: SQL Single BIGINT Column Scan in Struct
[info] Running case: SQL ORC MR
[info] Stopped after 2 iterations, 3220 ms
[info] Running case: SQL ORC Vectorized (Nested Column Disabled)
[info] Stopped after 2 iterations, 3203 ms
[info] Running case: SQL ORC Vectorized (Nested Column Enabled)
[info] Stopped after 11 iterations, 2110 ms
[info] Running case: SQL Parquet MR: DataPageV1
[info] Stopped after 2 iterations, 3757 ms
[info] Running case: SQL Parquet Vectorized: DataPageV1 (Nested Column Disabled)
[info] Stopped after 2 iterations, 4039 ms
[info] Running case: SQL Parquet Vectorized: DataPageV1 (Nested Column Enabled)
[info] Stopped after 14 iterations, 2039 ms
[info] Running case: SQL Parquet MR: DataPageV2
[info] Stopped after 2 iterations, 3222 ms
[info] Running case: SQL Parquet Vectorized: DataPageV2 (Nested Column Disabled)
[info] Stopped after 2 iterations, 3525 ms
[info] Running case: SQL Parquet Vectorized: DataPageV2 (Nested Column Enabled)
[info] Stopped after 16 iterations, 2063 ms
[info] OpenJDK 64-Bit Server VM 21.0.3 on Mac OS X 14.5
[info] Apple M3 Max
[info] SQL Single BIGINT Column Scan in Struct: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] -------------------------------------------------------------------------------------------------------------------------------------------
[info] SQL ORC MR 1606 1610 6 9.8 102.1 1.0X
[info] SQL ORC Vectorized (Nested Column Disabled) 1593 1602 13 9.9 101.3 1.0X
[info] SQL ORC Vectorized (Nested Column Enabled) 186 192 5 84.4 11.9 8.6X
[info] SQL Parquet MR: DataPageV1 1857 1879 31 8.5 118.1 0.9X
[info] SQL Parquet Vectorized: DataPageV1 (Nested Column Disabled) 2005 2020 21 7.8 127.5 0.8X
[info] SQL Parquet Vectorized: DataPageV1 (Nested Column Enabled) 143 146 5 110.3 9.1 11.3X
[info] SQL Parquet MR: DataPageV2 1590 1611 30 9.9 101.1 1.0X
[info] SQL Parquet Vectorized: DataPageV2 (Nested Column Disabled) 1737 1763 36 9.1 110.5 0.9X
[info] SQL Parquet Vectorized: DataPageV2 (Nested Column Enabled) 119 129 11 132.6 7.5 13.5X
[info] Running benchmark: SQL Single FLOAT Column Scan in Struct
[info] Running case: SQL ORC MR
[info] Stopped after 2 iterations, 3283 ms
[info] Running case: SQL ORC Vectorized (Nested Column Disabled)
[info] Stopped after 2 iterations, 3325 ms
[info] Running case: SQL ORC Vectorized (Nested Column Enabled)
[info] Stopped after 11 iterations, 2123 ms
[info] Running case: SQL Parquet MR: DataPageV1
[info] Stopped after 2 iterations, 3516 ms
[info] Running case: SQL Parquet Vectorized: DataPageV1 (Nested Column Disabled)
[info] Stopped after 2 iterations, 3827 ms
[info] Running case: SQL Parquet Vectorized: DataPageV1 (Nested Column Enabled)
[info] Stopped after 29 iterations, 2024 ms
[info] Running case: SQL Parquet MR: DataPageV2
[info] Stopped after 2 iterations, 3216 ms
[info] Running case: SQL Parquet Vectorized: DataPageV2 (Nested Column Disabled)
[info] Stopped after 2 iterations, 3558 ms
[info] Running case: SQL Parquet Vectorized: DataPageV2 (Nested Column Enabled)
[info] Stopped after 29 iterations, 2036 ms
[info] OpenJDK 64-Bit Server VM 21.0.3 on Mac OS X 14.5
[info] Apple M3 Max
[info] SQL Single FLOAT Column Scan in Struct: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] -------------------------------------------------------------------------------------------------------------------------------------------
[info] SQL ORC MR 1636 1642 9 9.6 104.0 1.0X
[info] SQL ORC Vectorized (Nested Column Disabled) 1663 1663 0 9.5 105.7 1.0X
[info] SQL ORC Vectorized (Nested Column Enabled) 189 193 4 83.4 12.0 8.7X
[info] SQL Parquet MR: DataPageV1 1733 1758 35 9.1 110.2 0.9X
[info] SQL Parquet Vectorized: DataPageV1 (Nested Column Disabled) 1900 1914 20 8.3 120.8 0.9X
[info] SQL Parquet Vectorized: DataPageV1 (Nested Column Enabled) 63 70 4 249.3 4.0 25.9X
[info] SQL Parquet MR: DataPageV2 1596 1608 17 9.9 101.5 1.0X
[info] SQL Parquet Vectorized: DataPageV2 (Nested Column Disabled) 1768 1779 16 8.9 112.4 0.9X
[info] SQL Parquet Vectorized: DataPageV2 (Nested Column Enabled) 60 70 6 262.2 3.8 27.3X
[info] Running benchmark: SQL Single DOUBLE Column Scan in Struct
[info] Running case: SQL ORC MR
[info] Stopped after 2 iterations, 3478 ms
[info] Running case: SQL ORC Vectorized (Nested Column Disabled)
[info] Stopped after 2 iterations, 3556 ms
[info] Running case: SQL ORC Vectorized (Nested Column Enabled)
[info] Stopped after 5 iterations, 2160 ms
[info] Running case: SQL Parquet MR: DataPageV1
[info] Stopped after 2 iterations, 3677 ms
[info] Running case: SQL Parquet Vectorized: DataPageV1 (Nested Column Disabled)
[info] Stopped after 2 iterations, 4027 ms
[info] Running case: SQL Parquet Vectorized: DataPageV1 (Nested Column Enabled)
[info] Stopped after 13 iterations, 2061 ms
[info] Running case: SQL Parquet MR: DataPageV2
[info] Stopped after 2 iterations, 3508 ms
[info] Running case: SQL Parquet Vectorized: DataPageV2 (Nested Column Disabled)
[info] Stopped after 2 iterations, 3748 ms
[info] Running case: SQL Parquet Vectorized: DataPageV2 (Nested Column Enabled)
[info] Stopped after 13 iterations, 2043 ms
[info] OpenJDK 64-Bit Server VM 21.0.3 on Mac OS X 14.5
[info] Apple M3 Max
[info] SQL Single DOUBLE Column Scan in Struct: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] -------------------------------------------------------------------------------------------------------------------------------------------
[info] SQL ORC MR 1735 1739 7 9.1 110.3 1.0X
[info] SQL ORC Vectorized (Nested Column Disabled) 1744 1778 49 9.0 110.9 1.0X
[info] SQL ORC Vectorized (Nested Column Enabled) 424 432 6 37.1 26.9 4.1X
[info] SQL Parquet MR: DataPageV1 1834 1839 7 8.6 116.6 0.9X
[info] SQL Parquet Vectorized: DataPageV1 (Nested Column Disabled) 2008 2014 8 7.8 127.7 0.9X
[info] SQL Parquet Vectorized: DataPageV1 (Nested Column Enabled) 151 159 8 103.9 9.6 11.5X
[info] SQL Parquet MR: DataPageV2 1749 1754 8 9.0 111.2 1.0X
[info] SQL Parquet Vectorized: DataPageV2 (Nested Column Disabled) 1866 1874 11 8.4 118.6 0.9X
[info] SQL Parquet Vectorized: DataPageV2 (Nested Column Enabled) 148 157 16 106.2 9.4 11.7X
[info] Running benchmark: SQL Nested Column Scan
[info] Running case: SQL ORC MR
[info] Stopped after 10 iterations, 63927 ms
[info] Running case: SQL ORC Vectorized (Nested Column Disabled)
[info] Stopped after 10 iterations, 63452 ms
[info] Running case: SQL ORC Vectorized (Nested Column Enabled)
[info] Stopped after 10 iterations, 24147 ms
[info] Running case: SQL Parquet MR: DataPageV1
[info] Stopped after 10 iterations, 39452 ms
[info] Running case: SQL Parquet Vectorized: DataPageV1 (Nested Column Disabled)
[info] Stopped after 10 iterations, 42079 ms
[info] Running case: SQL Parquet Vectorized: DataPageV1 (Nested Column Enabled)
[info] Stopped after 10 iterations, 21439 ms
[info] Running case: SQL Parquet MR: DataPageV2
[info] Stopped after 10 iterations, 44573 ms
[info] Running case: SQL Parquet Vectorized: DataPageV2 (Nested Column Disabled)
[info] Stopped after 10 iterations, 46903 ms
[info] Running case: SQL Parquet Vectorized: DataPageV2 (Nested Column Enabled)
[info] Stopped after 10 iterations, 19188 ms
[info] OpenJDK 64-Bit Server VM 21.0.3 on Mac OS X 14.5
[info] Apple M3 Max
[info] SQL Nested Column Scan: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] -------------------------------------------------------------------------------------------------------------------------------------------
[info] SQL ORC MR 6284 6393 77 0.2 5992.7 1.0X
[info] SQL ORC Vectorized (Nested Column Disabled) 6230 6345 80 0.2 5941.3 1.0X
[info] SQL ORC Vectorized (Nested Column Enabled) 2403 2415 15 0.4 2291.7 2.6X
[info] SQL Parquet MR: DataPageV1 3908 3945 51 0.3 3726.8 1.6X
[info] SQL Parquet Vectorized: DataPageV1 (Nested Column Disabled) 4178 4208 23 0.3 3984.3 1.5X
[info] SQL Parquet Vectorized: DataPageV1 (Nested Column Enabled) 2057 2144 88 0.5 1961.9 3.1X
[info] SQL Parquet MR: DataPageV2 4415 4457 31 0.2 4210.8 1.4X
[info] SQL Parquet Vectorized: DataPageV2 (Nested Column Disabled) 4658 4690 19 0.2 4442.0 1.3X
[info] SQL Parquet Vectorized: DataPageV2 (Nested Column Enabled) 1845 1919 39 0.6 1759.2 3.4X
[info] Running benchmark: Int and String Scan
[info] Running case: SQL CSV
[info] Stopped after 2 iterations, 12518 ms
[info] Running case: SQL Json
[info] Stopped after 2 iterations, 9416 ms
[info] Running case: SQL Parquet Vectorized: DataPageV1
[info] Stopped after 3 iterations, 2708 ms
[info] Running case: SQL Parquet Vectorized: DataPageV2
[info] Stopped after 2 iterations, 2137 ms
[info] Running case: SQL Parquet MR: DataPageV1
[info] Stopped after 2 iterations, 4865 ms
[info] Running case: SQL Parquet MR: DataPageV2
[info] Stopped after 2 iterations, 5053 ms
[info] Running case: SQL ORC Vectorized
[info] Stopped after 2 iterations, 2212 ms
[info] Running case: SQL ORC MR
[info] Stopped after 2 iterations, 4912 ms
[info] OpenJDK 64-Bit Server VM 21.0.3 on Mac OS X 14.5
[info] Apple M3 Max
[info] Int and String Scan: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] SQL CSV 6194 6259 92 1.7 590.7 1.0X
[info] SQL Json 4696 4708 17 2.2 447.9 1.3X
[info] SQL Parquet Vectorized: DataPageV1 899 903 7 11.7 85.7 6.9X
[info] SQL Parquet Vectorized: DataPageV2 1038 1069 44 10.1 99.0 6.0X
[info] SQL Parquet MR: DataPageV1 2426 2433 9 4.3 231.4 2.6X
[info] SQL Parquet MR: DataPageV2 2487 2527 56 4.2 237.2 2.5X
[info] SQL ORC Vectorized 1096 1106 15 9.6 104.5 5.7X
[info] SQL ORC MR 2425 2456 44 4.3 231.2 2.6X
[info] Running benchmark: Repeated String
[info] Running case: SQL CSV
[info] Stopped after 2 iterations, 7111 ms
[info] Running case: SQL Json
[info] Stopped after 2 iterations, 5912 ms
[info] Running case: SQL Parquet Vectorized: DataPageV1
[info] Stopped after 6 iterations, 2354 ms
[info] Running case: SQL Parquet Vectorized: DataPageV2
[info] Stopped after 6 iterations, 2371 ms
[info] Running case: SQL Parquet MR: DataPageV1
[info] Stopped after 2 iterations, 2417 ms
[info] Running case: SQL Parquet MR: DataPageV2
[info] Stopped after 2 iterations, 2345 ms
[info] Running case: SQL ORC Vectorized
[info] Stopped after 9 iterations, 2157 ms
[info] Running case: SQL ORC MR
[info] Stopped after 2 iterations, 2438 ms
[info] OpenJDK 64-Bit Server VM 21.0.3 on Mac OS X 14.5
[info] Apple M3 Max
[info] Repeated String: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] SQL CSV 3541 3556 21 3.0 337.7 1.0X
[info] SQL Json 2950 2956 9 3.6 281.3 1.2X
[info] SQL Parquet Vectorized: DataPageV1 380 392 18 27.6 36.3 9.3X
[info] SQL Parquet Vectorized: DataPageV2 381 395 16 27.6 36.3 9.3X
[info] SQL Parquet MR: DataPageV1 1188 1209 29 8.8 113.3 3.0X
[info] SQL Parquet MR: DataPageV2 1143 1173 42 9.2 109.0 3.1X
[info] SQL ORC Vectorized 235 240 7 44.5 22.5 15.0X
[info] SQL ORC MR 1204 1219 22 8.7 114.8 2.9X
[info] Running benchmark: Partitioned Table
[info] Running case: Data column - CSV
[info] Stopped after 2 iterations, 13313 ms
[info] Running case: Data column - Json
[info] Stopped after 2 iterations, 8077 ms
[info] Running case: Data column - Parquet Vectorized: DataPageV1
[info] Stopped after 27 iterations, 2074 ms
[info] Running case: Data column - Parquet Vectorized: DataPageV2
[info] Stopped after 22 iterations, 2056 ms
[info] Running case: Data column - Parquet MR: DataPageV1
[info] Stopped after 2 iterations, 3593 ms
[info] Running case: Data column - Parquet MR: DataPageV2
[info] Stopped after 2 iterations, 3460 ms
[info] Running case: Data column - ORC Vectorized
[info] Stopped after 18 iterations, 2106 ms
[info] Running case: Data column - ORC MR
[info] Stopped after 2 iterations, 3226 ms
[info] Running case: Partition column - CSV
[info] Stopped after 2 iterations, 3998 ms
[info] Running case: Partition column - Json
[info] Stopped after 2 iterations, 7534 ms
[info] Running case: Partition column - Parquet Vectorized: DataPageV1
[info] Stopped after 49 iterations, 2022 ms
[info] Running case: Partition column - Parquet Vectorized: DataPageV2
[info] Stopped after 74 iterations, 2024 ms
[info] Running case: Partition column - Parquet MR: DataPageV1
[info] Stopped after 3 iterations, 2594 ms
[info] Running case: Partition column - Parquet MR: DataPageV2
[info] Stopped after 3 iterations, 2601 ms
[info] Running case: Partition column - ORC Vectorized
[info] Stopped after 69 iterations, 2032 ms
[info] Running case: Partition column - ORC MR
[info] Stopped after 3 iterations, 2215 ms
[info] Running case: Both columns - CSV
[info] Stopped after 2 iterations, 14015 ms
[info] Running case: Both columns - Json
[info] Stopped after 2 iterations, 8828 ms
[info] Running case: Both columns - Parquet Vectorized: DataPageV1
[info] Stopped after 26 iterations, 2013 ms
[info] Running case: Both columns - Parquet Vectorized: DataPageV2
[info] Stopped after 20 iterations, 2027 ms
[info] Running case: Both columns - Parquet MR: DataPageV1
[info] Stopped after 2 iterations, 3386 ms
[info] Running case: Both columns - Parquet MR: DataPageV2
[info] Stopped after 2 iterations, 3123 ms
[info] Running case: Both columns - ORC Vectorized
[info] Stopped after 14 iterations, 2039 ms
[info] Running case: Both columns - ORC MR
[info] Stopped after 2 iterations, 3456 ms
[info] OpenJDK 64-Bit Server VM 21.0.3 on Mac OS X 14.5
[info] Apple M3 Max
[info] Partitioned Table: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] ---------------------------------------------------------------------------------------------------------------------------------
[info] Data column - CSV 6546 6657 157 2.4 416.2 1.0X
[info] Data column - Json 4025 4039 20 3.9 255.9 1.6X
[info] Data column - Parquet Vectorized: DataPageV1 59 77 11 265.8 3.8 110.6X
[info] Data column - Parquet Vectorized: DataPageV2 81 93 12 193.2 5.2 80.4X
[info] Data column - Parquet MR: DataPageV1 1795 1797 2 8.8 114.1 3.6X
[info] Data column - Parquet MR: DataPageV2 1716 1730 20 9.2 109.1 3.8X
[info] Data column - ORC Vectorized 112 117 5 140.7 7.1 58.6X
[info] Data column - ORC MR 1574 1613 55 10.0 100.1 4.2X
[info] Partition column - CSV 1996 1999 5 7.9 126.9 3.3X
[info] Partition column - Json 3708 3767 83 4.2 235.8 1.8X
[info] Partition column - Parquet Vectorized: DataPageV1 27 41 17 589.2 1.7 245.2X
[info] Partition column - Parquet Vectorized: DataPageV2 23 27 2 675.0 1.5 280.9X
[info] Partition column - Parquet MR: DataPageV1 826 865 38 19.0 52.5 7.9X
[info] Partition column - Parquet MR: DataPageV2 846 867 26 18.6 53.8 7.7X
[info] Partition column - ORC Vectorized 25 29 4 627.2 1.6 261.0X
[info] Partition column - ORC MR 737 739 2 21.3 46.9 8.9X
[info] Both columns - CSV 6987 7008 29 2.3 444.2 0.9X
[info] Both columns - Json 4397 4414 25 3.6 279.5 1.5X
[info] Both columns - Parquet Vectorized: DataPageV1 72 77 3 219.9 4.5 91.5X
[info] Both columns - Parquet Vectorized: DataPageV2 97 101 3 162.0 6.2 67.4X
[info] Both columns - Parquet MR: DataPageV1 1676 1693 25 9.4 106.5 3.9X
[info] Both columns - Parquet MR: DataPageV2 1555 1562 10 10.1 98.8 4.2X
[info] Both columns - ORC Vectorized 140 146 5 112.1 8.9 46.6X
[info] Both columns - ORC MR 1646 1728 117 9.6 104.6 4.0X
[info] Running benchmark: String with Nulls Scan (0.0%)
[info] Running case: SQL CSV
[info] Stopped after 2 iterations, 8548 ms
[info] Running case: SQL Json
[info] Stopped after 2 iterations, 8036 ms
[info] Running case: SQL Parquet Vectorized: DataPageV1
[info] Stopped after 4 iterations, 2112 ms
[info] Running case: SQL Parquet Vectorized: DataPageV2
[info] Stopped after 3 iterations, 2202 ms
[info] Running case: SQL Parquet MR: DataPageV1
[info] Stopped after 2 iterations, 3887 ms
[info] Running case: SQL Parquet MR: DataPageV2
[info] Stopped after 2 iterations, 4609 ms
[info] Running case: ParquetReader Vectorized: DataPageV1
[info] Stopped after 6 iterations, 2017 ms
[info] Running case: ParquetReader Vectorized: DataPageV2
[info] Stopped after 4 iterations, 2370 ms
[info] Running case: SQL ORC Vectorized
[info] Stopped after 4 iterations, 2078 ms
[info] Running case: SQL ORC MR
[info] Stopped after 2 iterations, 3689 ms
[info] OpenJDK 64-Bit Server VM 21.0.3 on Mac OS X 14.5
[info] Apple M3 Max
[info] String with Nulls Scan (0.0%): Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] SQL CSV 4238 4274 51 2.5 404.2 1.0X
[info] SQL Json 4007 4018 16 2.6 382.1 1.1X
[info] SQL Parquet Vectorized: DataPageV1 523 528 6 20.1 49.9 8.1X
[info] SQL Parquet Vectorized: DataPageV2 728 734 6 14.4 69.5 5.8X
[info] SQL Parquet MR: DataPageV1 1939 1944 7 5.4 184.9 2.2X
[info] SQL Parquet MR: DataPageV2 2298 2305 10 4.6 219.1 1.8X
[info] ParquetReader Vectorized: DataPageV1 331 336 3 31.7 31.6 12.8X
[info] ParquetReader Vectorized: DataPageV2 585 593 6 17.9 55.8 7.2X
[info] SQL ORC Vectorized 487 520 24 21.5 46.4 8.7X
[info] SQL ORC MR 1765 1845 113 5.9 168.3 2.4X
[info] Running benchmark: String with Nulls Scan (50.0%)
[info] Running case: SQL CSV
[info] Stopped after 2 iterations, 6797 ms
[info] Running case: SQL Json
[info] Stopped after 2 iterations, 7274 ms
[info] Running case: SQL Parquet Vectorized: DataPageV1
[info] Stopped after 5 iterations, 2176 ms
[info] Running case: SQL Parquet Vectorized: DataPageV2
[info] Stopped after 4 iterations, 2376 ms
[info] Running case: SQL Parquet MR: DataPageV1
[info] Stopped after 2 iterations, 3895 ms
[info] Running case: SQL Parquet MR: DataPageV2
[info] Stopped after 2 iterations, 4081 ms
[info] Running case: ParquetReader Vectorized: DataPageV1
[info] Stopped after 6 iterations, 2235 ms
[info] Running case: ParquetReader Vectorized: DataPageV2
[info] Stopped after 4 iterations, 2097 ms
[info] Running case: SQL ORC Vectorized
[info] Stopped after 3 iterations, 2099 ms
[info] Running case: SQL ORC MR
[info] Stopped after 2 iterations, 4133 ms
[info] OpenJDK 64-Bit Server VM 21.0.3 on Mac OS X 14.5
[info] Apple M3 Max
[info] String with Nulls Scan (50.0%): Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] SQL CSV 3362 3399 52 3.1 320.7 1.0X
[info] SQL Json 3608 3637 42 2.9 344.1 0.9X
[info] SQL Parquet Vectorized: DataPageV1 422 435 20 24.9 40.2 8.0X
[info] SQL Parquet Vectorized: DataPageV2 580 594 27 18.1 55.3 5.8X
[info] SQL Parquet MR: DataPageV1 1938 1948 14 5.4 184.8 1.7X
[info] SQL Parquet MR: DataPageV2 2014 2041 37 5.2 192.1 1.7X
[info] ParquetReader Vectorized: DataPageV1 365 373 13 28.7 34.8 9.2X
[info] ParquetReader Vectorized: DataPageV2 522 524 2 20.1 49.8 6.4X
[info] SQL ORC Vectorized 696 700 3 15.1 66.4 4.8X
[info] SQL ORC MR 2057 2067 13 5.1 196.2 1.6X
[info] Running benchmark: String with Nulls Scan (95.0%)
[info] Running case: SQL CSV
[info] Stopped after 2 iterations, 5129 ms
[info] Running case: SQL Json
[info] Stopped after 2 iterations, 5273 ms
[info] Running case: SQL Parquet Vectorized: DataPageV1
[info] Stopped after 16 iterations, 2114 ms
[info] Running case: SQL Parquet Vectorized: DataPageV2
[info] Stopped after 15 iterations, 2004 ms
[info] Running case: SQL Parquet MR: DataPageV1
[info] Stopped after 2 iterations, 2410 ms
[info] Running case: SQL Parquet MR: DataPageV2
[info] Stopped after 2 iterations, 2365 ms
[info] Running case: ParquetReader Vectorized: DataPageV1
[info] Stopped after 22 iterations, 2056 ms
[info] Running case: ParquetReader Vectorized: DataPageV2
[info] Stopped after 19 iterations, 2062 ms
[info] Running case: SQL ORC Vectorized
[info] Stopped after 11 iterations, 2153 ms
[info] Running case: SQL ORC MR
[info] Stopped after 2 iterations, 2660 ms
[info] OpenJDK 64-Bit Server VM 21.0.3 on Mac OS X 14.5
[info] Apple M3 Max
[info] String with Nulls Scan (95.0%): Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] SQL CSV 2545 2565 29 4.1 242.7 1.0X
[info] SQL Json 2577 2637 84 4.1 245.8 1.0X
[info] SQL Parquet Vectorized: DataPageV1 116 132 13 90.7 11.0 22.0X
[info] SQL Parquet Vectorized: DataPageV2 128 134 5 82.2 12.2 20.0X
[info] SQL Parquet MR: DataPageV1 1188 1205 24 8.8 113.3 2.1X
[info] SQL Parquet MR: DataPageV2 1141 1183 59 9.2 108.8 2.2X
[info] ParquetReader Vectorized: DataPageV1 90 93 5 116.8 8.6 28.3X
[info] ParquetReader Vectorized: DataPageV2 106 109 2 99.0 10.1 24.0X
[info] SQL ORC Vectorized 188 196 7 55.7 18.0 13.5X
[info] SQL ORC MR 1303 1330 39 8.0 124.2 2.0X
[info] Running benchmark: Single Column Scan from 10 columns
[info] Running case: SQL CSV
[info] Stopped after 3 iterations, 2123 ms
[info] Running case: SQL Json
[info] Stopped after 3 iterations, 2333 ms
[info] Running case: SQL Parquet Vectorized: DataPageV1
[info] Stopped after 75 iterations, 2008 ms
[info] Running case: SQL Parquet Vectorized: DataPageV2
[info] Stopped after 78 iterations, 2013 ms
[info] Running case: SQL Parquet MR: DataPageV1
[info] Stopped after 16 iterations, 2105 ms
[info] Running case: SQL Parquet MR: DataPageV2
[info] Stopped after 16 iterations, 2109 ms
[info] Running case: SQL ORC Vectorized
[info] Stopped after 76 iterations, 2015 ms
[info] Running case: SQL ORC MR
[info] Stopped after 17 iterations, 2118 ms
[info] OpenJDK 64-Bit Server VM 21.0.3 on Mac OS X 14.5
[info] Apple M3 Max
[info] Single Column Scan from 10 columns: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] SQL CSV 696 708 17 1.5 663.7 1.0X
[info] SQL Json 776 778 2 1.4 739.9 0.9X
[info] SQL Parquet Vectorized: DataPageV1 22 27 5 48.0 20.9 31.8X
[info] SQL Parquet Vectorized: DataPageV2 23 26 3 45.9 21.8 30.4X
[info] SQL Parquet MR: DataPageV1 126 132 4 8.3 120.3 5.5X
[info] SQL Parquet MR: DataPageV2 121 132 7 8.7 115.3 5.8X
[info] SQL ORC Vectorized 23 27 3 44.8 22.3 29.7X
[info] SQL ORC MR 118 125 3 8.9 113.0 5.9X
[info] 12:17:16.486 WARN org.apache.spark.sql.catalyst.util.SparkStringUtils: Truncated the string representation of a plan since it was too large. This behavior can be adjusted by setting 'spark.sql.debug.maxToStringFields'.
[info] Running benchmark: Single Column Scan from 50 columns
[info] Running case: SQL CSV
[info] Stopped after 2 iterations, 2604 ms
[info] Running case: SQL Json
[info] Stopped after 2 iterations, 5403 ms
[info] Running case: SQL Parquet Vectorized: DataPageV1
[info] Stopped after 67 iterations, 2016 ms
[info] Running case: SQL Parquet Vectorized: DataPageV2
[info] Stopped after 72 iterations, 2011 ms
[info] Running case: SQL Parquet MR: DataPageV1
[info] Stopped after 15 iterations, 2116 ms
[info] Running case: SQL Parquet MR: DataPageV2
[info] Stopped after 15 iterations, 2002 ms
[info] Running case: SQL ORC Vectorized
[info] Stopped after 65 iterations, 2008 ms
[info] Running case: SQL ORC MR
[info] Stopped after 15 iterations, 2014 ms
[info] OpenJDK 64-Bit Server VM 21.0.3 on Mac OS X 14.5
[info] Apple M3 Max
[info] Single Column Scan from 50 columns: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] SQL CSV 1300 1302 3 0.8 1240.0 1.0X
[info] SQL Json 2669 2702 47 0.4 2545.0 0.5X
[info] SQL Parquet Vectorized: DataPageV1 24 30 7 44.2 22.6 54.8X
[info] SQL Parquet Vectorized: DataPageV2 23 28 3 45.0 22.2 55.8X
[info] SQL Parquet MR: DataPageV1 127 141 9 8.3 120.7 10.3X
[info] SQL Parquet MR: DataPageV2 131 134 2 8.0 124.5 10.0X
[info] SQL ORC Vectorized 26 31 4 39.7 25.2 49.2X
[info] SQL ORC MR 131 134 2 8.0 125.2 9.9X
[info] Running benchmark: Single Column Scan from 100 columns
[info] Running case: SQL CSV
[info] Stopped after 2 iterations, 4357 ms
[info] Running case: SQL Json
[info] Stopped after 2 iterations, 9909 ms
[info] Running case: SQL Parquet Vectorized: DataPageV1
[info] Stopped after 52 iterations, 2016 ms
[info] Running case: SQL Parquet Vectorized: DataPageV2
[info] Stopped after 57 iterations, 2017 ms
[info] Running case: SQL Parquet MR: DataPageV1
[info] Stopped after 14 iterations, 2041 ms
[info] Running case: SQL Parquet MR: DataPageV2
[info] Stopped after 15 iterations, 2128 ms
[info] Running case: SQL ORC Vectorized
[info] Stopped after 53 iterations, 2033 ms
[info] Running case: SQL ORC MR
[info] Stopped after 15 iterations, 2121 ms
[info] OpenJDK 64-Bit Server VM 21.0.3 on Mac OS X 14.5
[info] Apple M3 Max
[info] Single Column Scan from 100 columns: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] SQL CSV 2169 2179 13 0.5 2069.0 1.0X
[info] SQL Json 4811 4955 204 0.2 4587.9 0.5X
[info] SQL Parquet Vectorized: DataPageV1 30 39 5 34.7 28.8 71.8X
[info] SQL Parquet Vectorized: DataPageV2 30 35 8 35.1 28.5 72.7X
[info] SQL Parquet MR: DataPageV1 139 146 5 7.6 132.2 15.7X
[info] SQL Parquet MR: DataPageV2 130 142 7 8.1 123.8 16.7X
[info] SQL ORC Vectorized 34 38 3 31.3 32.0 64.7X
[info] SQL ORC MR 138 141 2 7.6 131.5 15.7X
[success] Total time: 2477 s (41:17), completed Jul 1, 2024, 12:20:20 PM
BuiltInDataSourceWriteBenchmark
[info] running (fork) org.apache.spark.sql.execution.benchmark.BuiltInDataSourceWriteBenchmark parquet
[error] WARNING: Using incubator modules: jdk.incubator.vector
[info] 12:27:34.730 WARN org.apache.hadoop.util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
[info] Running benchmark: parquet writer benchmark
[info] Running case: Output Single Int Column
[info] Stopped after 2 iterations, 2964 ms
[info] Running case: Output Single Double Column
[info] Stopped after 2 iterations, 2910 ms
[info] Running case: Output Int and String Column
[info] Stopped after 2 iterations, 5380 ms
[info] Running case: Output Partitions
[info] Stopped after 2 iterations, 4753 ms
[info] Running case: Output Buckets
[info] Stopped after 2 iterations, 6044 ms
[info] OpenJDK 64-Bit Server VM 21.0.3 on Mac OS X 14.5
[info] Apple M3 Max
[info] parquet writer benchmark: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] Output Single Int Column 1443 1482 55 10.9 91.8 1.0X
[info] Output Single Double Column 1427 1455 40 11.0 90.7 1.0X
[info] Output Int and String Column 2686 2690 6 5.9 170.7 0.5X
[info] Output Partitions 2368 2377 12 6.6 150.6 0.6X
[info] Output Buckets 3010 3022 18 5.2 191.4 0.5X
[success] Total time: 106 s (01:46), completed Jul 1, 2024, 12:28:13 PM
Verdict
I don't see any huge deviations from main. Sometimes this branch is a bit faster, sometimes the the main branch is just a bit faster. Does the deviation look acceptable to you?
We should run the corresponding benchmarks using GitHub Actions and update their results in the pr, both Java 17 and 21
Kicked them off: https://github.com/Fokko/spark/actions/workflows/benchmark.yml
@LuciferYang I've updated the PR. Sorry, I wasn't aware that you'll need to run the benchmarks in the GA. I was assuming that the runners would be too noisy.
Thank you, @Fokko and all. Let me merge this for Apache Spark 4.0.0-preview2.