Draft: support spark v2 write

Open zhongyujiang opened this issue 10 months ago • 0 comments

Purpose

Linked issue: part of #4816

Support spark datasource v2 write path, reduce write serialization overhead and accelerate the process of writing to primary key tables in Spark. Currently only added support for fixed-bucket table.

Tests

BucketFunctionTest, SparkWriteITCase

PaimonSourceWriteBenchmark：

Benchmark                           Mode  Cnt   Score    Error  Units
PaimonSourceWriteBenchmark.v1Write    ss    5  13.845 ± 23.192   s/op
PaimonSourceWriteBenchmark.v2Write    ss    5   9.579 ± 14.929   s/op

API and Format

Documentation

Add a config spark.sql.paimon.use-v2-write to enable switching to v2 write, will fall back to v1 write when encountering an unsupported scenario(e.g. HASH_DYNAMIC bucket mode table).

Note: this is an overall draft PR, which will be split into smaller PRs for easier review.

Mar 10 '25 04:03 zhongyujiang