xianjingfeng

Results 40 issues of xianjingfeng

Implement a table-based region grouping strategy for RegionGroupingProvider

### What changes were proposed in this pull request? 1.In client side, try read from all shuffle servers when read shuffle data. When read shuffle data from memory, pass `processBlockIds`...

1. If we set `spark.rss.data.replica.write=2` and `spark.rss.data.replica=3`,Data integrity cannot be guaranteed in any one shuffle server. right? 2. But in method `org.apache.uniffle.storage.handler.impl.LocalFileQuorumClientReadHandler#readShuffleData`, it just read from one shuffle server

Every commit calls must success when `sendCommit` now, this will casue if one shuffle server dead, then application fail

### What changes were proposed in this pull request? Execute start script with nohup ### Why are the changes needed? Process don't exit if exec start script using ansible. Therefore,...

We found shuffle server which under high load is easy encounter `java.lang.OutOfMemoryError: Java heap space` even we allocate more jvm heap memory and less `rss.server.buffer.capacity ` The steps for the...

kill process is not graceful, so we need shuffle server support decommissioned

In `RssShuffleManager`,the `workQueue` of `threadPoolExecutor`is unbounded now. If `sendShuffleData` not fast enough, it will cost a lot of memory

Process don't exit if exec start script using ansible. Therefore, we can't do batch start operation for this.