xianjingfeng
xianjingfeng
We have 1.2T memory per host, so we need deploy multiple shuffle servers in a single node. But current partition assignment policy is not suitable to do that. Because partition...
### What changes were proposed in this pull request? 1. Create a new rpc api for decommissioned in shuffle server, we can request this api by shell. 2. When shuffle...
We found the value of `grpc_open` sometime very big(>1000) even no application run in our cluster
### What changes were proposed in this pull request? Write to hdfs when local disk can't be write ### Why are the changes needed? There should be a fallback mechanism...
In `org.apache.uniffle.server.ShuffleFlushManager#processPendingEvents`,OOM will happen if a large number of events need to be dropped, because `usedMemory` release immediately, but the speed of GC is not fast enough.
The push mode will be more convenient and more real-time. We have implemented it. If necessary, I will create a PR
### What changes were proposed in this pull request? Limit the speed of memory release when drop pending events ### Why are the changes needed? OOM will happen if a...
[Improvement] `rss.client.assignment.shuffle.nodes.max` should not be less than `rss.data.replica`
### Code of Conduct - [X] I agree to follow this project's [Code of Conduct](https://www.apache.org/foundation/policies/conduct) ### Search before asking - [X] I have searched in the [issues](https://github.com/apache/incubator-uniffle/issues?q=is%3Aissue) and found no...
### Code of Conduct - [X] I agree to follow this project's [Code of Conduct](https://www.apache.org/foundation/policies/conduct) ### Search before asking - [X] I have searched in the [issues](https://github.com/apache/incubator-uniffle/issues?q=is%3Aissue) and found no...
### Discussed in https://github.com/apache/incubator-uniffle/discussions/1268 Originally posted by **youjeongsue** October 27, 2023 Hi. I'm setting up a prometheus pushgateway to collect metrics. https://github.com/apache/incubator-uniffle/blob/master/docs/metrics_guide.md#report-metrics-to-prometheus-automatically ``` rss.metrics.prometheus.pushgateway.addr rss-pushgateway...com:80 rss.metrics.prometheus.pushgateway.jobname rss-shuffle-server rss.metrics.reporter.class org.apache.uniffle.common.metrics.prometheus.PrometheusPushGatewayMetricReporter ```...