celeborn icon indicating copy to clipboard operation
celeborn copied to clipboard

[CELEBORN-1451] HPA support

Open lianneli opened this issue 1 year ago • 1 comments

What changes were proposed in this pull request?

  1. Add HorizontalPodAutoscaler to worker in helm chart.
  2. Add HorizontalPodAutoscaler test.
  3. Add lifecycle preStop hook to worker StatefulSet, when HPA close worker, worker will trigger decommission through http restful api.
  4. Delete duplicated resources key in worker and master StatefulSet.
  5. Change app version to 0.6.0

Why are the changes needed?

For most of time in day time, spark task is very little and shuffle data is barely empty. Celeborn do not need much Pods which got waste of resources. HPA can control this automatically.

Does this PR introduce any user-facing change?

no. I add a switch to the HPA and the default value is false.

How was this patch tested?

Tested locally and in dev environment.

lianneli avatar Sep 30 '24 10:09 lianneli

@lianneli This is a great feature. On what metrics it will upscale/downscale. Is there any document for this?

s0nskar avatar Sep 30 '24 11:09 s0nskar

@lianneli This is a great feature. On what metrics it will upscale/downscale. Is there any document for this?

@s0nskar The official doc is https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/.

For more conveniently using, I make a default setting in Values.yaml. Namely, choose cpu utilization for worker pods as the decisive metric. When cpu utilization higher than 70% as long as 10s, then upscale; and when cpu utilization lower than 70% last for 300s, then downscale.

There are still some risks involved since the worker may still work. Although worker pods will trigger to decommission before close, it's highly recommended to set celeborn.client.push.replicate.enabled to true for more stable performance.

lianneli avatar Oct 08 '24 03:10 lianneli

@lianneli This is a great feature. On what metrics it will upscale/downscale. Is there any document for this?

@s0nskar The official doc is https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/.

For more conveniently using, I make a default setting in Values.yaml. Namely, choose cpu utilization for worker pods as the decisive metric. When cpu utilization higher than 70% as long as 10s, then upscale; and when cpu utilization lower than 70% last for 300s, then downscale.

There are still some risks involved since the worker may still work. Although worker pods will trigger to decommission before close, it's highly recommended to set celeborn.client.push.replicate.enabled to true for more stable performance.

Thanks @lianneli for supporting Celeborn use HPA, But for Celeborn StatefulSet, I believe there are several shortcomings with HPA and current implementations:

  1. Worker may still working, although worker pods will trigger to decommission before close as your comment.
  2. Once a worker pod has been decommissioned, there is currently no mechanism in place to recommission it. Consequently, if the cluster experiences increased demand post-decommissioning, new worker pods must be spun up. and the cluster may has low resource efficiency, or data maybe loss because 1.
  3. Such as the stabilization window (stabilizationWindowSeconds), or the ability to dynamically enable/disable scaling up/down, are fixed at deployment time. Adjustments to these settings or altering the number of replicas in the StatefulSet require redeployment, IMO we can support changing those parameter/configuration at runtime.
  4. The solution has limitation as If we want support custom scale behavior/metrics(network/disk space/memory/cpu?) or ResourceManager (Inner Platform )(not direct talk to k8s)

Someone in the community has also proposed a solution for scaling (maybe later send to dev mail list), we can discussion these two solutions about scaling celeborn.

RexXiong avatar Oct 08 '24 06:10 RexXiong

@RexXiong The solution is great. I will follow up the discussion though mail list.

lianneli avatar Oct 08 '24 08:10 lianneli

This PR is stale because it has been open 20 days with no activity. Remove stale label or comment or this will be closed in 10 days.

github-actions[bot] avatar Oct 28 '24 08:10 github-actions[bot]

This issue was closed because it has been staled for 10 days with no activity.

github-actions[bot] avatar Nov 08 '24 08:11 github-actions[bot]