paimon
paimon copied to clipboard
Apache Paimon is a lake format that enables building a Realtime Lakehouse Architecture with Flink and Spark for both streaming and batch operations.
### Purpose Linked issue: close #xxx ### Tests ### API and Format ### Documentation
### Purpose Introduce a new way to ingest data from other sources to Paimon. ### Tests ### API and Format ### Documentation
### Purpose Reduce the number of io calls for object storage. ### Tests N/A ### API and Format N/A
### Purpose This PR fixes an issue where the compaction logic in `BucketedAppendCompactManager` only filtered the first large file from the queue, leaving subsequent large files in the compaction set...
## Purpose from #5955 ### What is the purpose of the change Optimize SyncDatabaseAction performance by removing expensive listTables operations during initialization, improving scalability for databases with many tables. ###...
### Purpose Reading table files contains scan phase and partition read phase. When we use bloom-filter or other file indexs, we found the scan metrics are the total data files...
### Purpose Linked issue: https://github.com/apache/paimon/issues/5932 fix bug with Spark 3.3 throws write exceptions during partial-update engine operations ``` CREATE TABLE T( f1 int, f2 string, f3 string, f4 string) TBLPROPERTIES...
### Purpose Linked issue: close #5724 When LookupChangelogMergeFunctionWrapper is used for changelog and config: 'changelog-producer.row-deduplicate' = 'true' , add a judgment of 'highLevel' record on whether it has been deleted...
### Purpose Linked issue: close #5709 fix branch : paimon-1.2-snapshot ### Tests ``` spark.sql( """ |create table my_table ( | k int, | v string |) tblproperties ( | 'primary-key'...
### Purpose 1. Monitor the input/output (IO) of Flink during read and write operations. ### Tests ### API and Format ### Documentation