xianjingfeng
xianjingfeng
Implement a table-based region grouping strategy for RegionGroupingProvider
### What changes were proposed in this pull request? 1.In client side, try read from all shuffle servers when read shuffle data. When read shuffle data from memory, pass `processBlockIds`...
1. If we set `spark.rss.data.replica.write=2` and `spark.rss.data.replica=3`,Data integrity cannot be guaranteed in any one shuffle server. right? 2. But in method `org.apache.uniffle.storage.handler.impl.LocalFileQuorumClientReadHandler#readShuffleData`, it just read from one shuffle server
Every commit calls must success when `sendCommit` now, this will casue if one shuffle server dead, then application fail
### What changes were proposed in this pull request? Execute start script with nohup ### Why are the changes needed? Process don't exit if exec start script using ansible. Therefore,...
We found shuffle server which under high load is easy encounter `java.lang.OutOfMemoryError: Java heap space` even we allocate more jvm heap memory and less `rss.server.buffer.capacity ` The steps for the...
kill process is not graceful, so we need shuffle server support decommissioned
In `RssShuffleManager`,the `workQueue` of `threadPoolExecutor`is unbounded now. If `sendShuffleData` not fast enough, it will cost a lot of memory
Process don't exit if exec start script using ansible. Therefore, we can't do batch start operation for this.