celeborn
celeborn copied to clipboard
Apache Celeborn is an elastic and high-performance service for shuffle and spilled data.
### What changes were proposed in this pull request? Make some common dependencies optional for openapi-client, spi and cli modules. 1. set hadoop deps for provided for cli modules 2....
[CELEBORN-1497] Optimize read buffer dispatcher to retain buffers until worker has memory pressure
### What changes were proposed in this pull request? To improve read buffer dispatcher performance. ### Why are the changes needed? In map partition scenario, there can be lots of...
### What changes were proposed in this pull request? To support revising lost shuffle IDs in a long-running job such as flink batch jobs. ### Why are the changes needed?...
### What changes were proposed in this pull request? Support quota low watermark for checking quota available. This will not allow new jobs to run on Celeborn if quota used...
### What changes were proposed in this pull request? After profiling to see where the hotspots are for slot selection, we identified 2 main areas: - iter.remove ([link](https://github.com/apache/celeborn/blob/main/master/src/main/java/org/apache/celeborn/service/deploy/master/SlotsAllocator.java#L447)) is a...
### What changes were proposed in this pull request? it's a joint work with @YutingWang98 currently we have to wait for spark shuffle object gc to clean disk space occupied...
### What changes were proposed in this pull request? change sparkUtils taskAnotherAttemptRunningOrSuccessful method (https://issues.apache.org/jira/browse/CELEBORN-1983) ### Why are the changes needed? Because it will lead to job failure. ### Does this...
### What changes were proposed in this pull request? as title ### Why are the changes needed? Merge Resource.proto into TransportMessages.proto as per the below design https://cwiki.apache.org/confluence/display/CELEBORN/CIP-16+Merge+transport+proto+and+resource+proto+files ### Does this...
### What changes were proposed in this pull request? Remove the redundant release of data after OutOfDirectMemoryError appears in flushBuffer.addComponent ### Why are the changes needed? The reason why OutOfDirectMemoryError...
### What changes were proposed in this pull request? `org.apache.flink.runtime.io.network.partition.consumer.PartitionConnectionException` is thrown when RemoteBufferStreamReader finds that the current exception is about connection failure. ### Why are the changes needed? If...