deltacat
deltacat copied to clipboard
Support limiting deltas entries in a compaction round
Currently, we only limit deltas in a compaction round based on total object store memory available in a cluster. When there is a very large delta that contains many manifest files, we still have to limit them and perform re-batching.
From https://github.com/ray-project/deltacat/pull/70:
... the current contract of compaction assumes that each round must be able to compact at least one delta. To work with extremely large deltas we'll need to drive that down to at least file-level granularity (which will drive subsequent changes into the Round Completion File and each round that reads it to determine a starting point). Future improvements would then include driving each round down to record-level granularity to work with files that are too large to complete in a single round.
Primary key index building is a pre-requisite to running multiple rounds: #63