paimon
paimon copied to clipboard
Draft: support spark v2 write
Purpose
Linked issue: part of #4816
Support spark datasource v2 write path, reduce write serialization overhead and accelerate the process of writing to primary key tables in Spark. Currently only added support for fixed-bucket table.
Tests
BucketFunctionTest, SparkWriteITCase
PaimonSourceWriteBenchmark:
Benchmark Mode Cnt Score Error Units
PaimonSourceWriteBenchmark.v1Write ss 5 13.845 ± 23.192 s/op
PaimonSourceWriteBenchmark.v2Write ss 5 9.579 ± 14.929 s/op
API and Format
Documentation
Add a config spark.sql.paimon.use-v2-write to enable switching to v2 write, will fall back to v1 write when encountering an unsupported scenario(e.g. HASH_DYNAMIC bucket mode table).
Note: this is an overall draft PR, which will be split into smaller PRs for easier review.