paimon
paimon copied to clipboard
Apache Paimon is a lake format that enables building a Realtime Lakehouse Architecture with Flink and Spark for both streaming and batch operations.
- We have done some tests. Parquet is 30% faster. - After [FLINK-30565](https://issues.apache.org/jira/browse/FLINK-30565), Parquet can support complex types and file systems such as OSS and s3 (decoupled from hadoop filesystem)....
[FLINK-26465] Optimize SortMergeReader: use loser tree to reduce comparisons
### Purpose Linked issue: [open [Feature] Introduce clone Action and Procedure](https://github.com/apache/paimon/issues/3229) ### Tests ### API and Format ### Documentation
### Purpose Linked issue: close (https://github.com/apache/incubator-paimon/issues/2861) When writing and reading branch data for Flink, it was found that if the "TableCommitImpl newcommit (String commitUser, String branchName)" interface is added like...
### Search before asking - [X] I searched in the [issues](https://github.com/apache/paimon/issues) and found nothing similar. ### Motivation In stream mode, the unaware bucket mode will use the `FIFOSplitAssigner`, So we...
### Purpose Linked issue: close #xxx Filter out invalid splits to improve flink database compaction efficiency (1) Referring to the pickFullComparison method of CompactStrategy, filter out whether dataFiles is empty...
### Purpose This PR adds a `numWriters` metrics to monitor the number of active writers in each parallelism. This metric can help us determine if the memory shortage is caused...
### Search before asking - [X] I searched in the [issues](https://github.com/apache/paimon/issues) and found nothing similar. ### Paimon version master ### Compute Engine flink ### Minimal reproduce step 1. First set...
### Purpose This PR is meant to support decouple the delta files lifecycle #2899 The basic idea behind this is that: - Add fileSource in `DatafileMeta` to indicate whether this...