spark-operator icon indicating copy to clipboard operation
spark-operator copied to clipboard

Feature: Support hot update executor priorityClassName

Open houyuting opened this issue 2 weeks ago • 1 comments

Purpose of this PR

Update executor priorityClassName not restart sparkapp

Change Category

  • [ ] Bugfix (non-breaking change which fixes an issue)
  • [x] Feature (non-breaking change which adds functionality)
  • [ ] Breaking change (fix or feature that could affect existing functionality)
  • [ ] Documentation update

Rationale

Checklist

  • [x] I have conducted a self-review of my own code.
  • [ ] I have updated documentation accordingly.
  • [ ] I have added tests that prove my changes are effective or that my feature works.
  • [ ] Existing unit tests pass locally with my changes.

Additional Notes

houyuting avatar Dec 10 '25 04:12 houyuting

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: Once this PR has been reviewed and has the lgtm label, please assign jacobsalway for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment Approvers can cancel approval by writing /approve cancel in a comment

google-oss-prow[bot] avatar Dec 10 '25 04:12 google-oss-prow[bot]

This will not work because the lifecycle of executor pods are managed by Spark driver, the driver process will not try to reload configurations when running.

ChenYi015 avatar Dec 12 '25 09:12 ChenYi015

This will not work because the lifecycle of executor pods are managed by Spark driver, the driver process will not try to reload configurations when running.

@ChenYi015 hi , The driver does not need to reload the configuration because priorityClassName takes effect at the Pod level. The purpose of this feature is that when the executor priorityClassName of a Spark application changes, the spark job itself does not need to be restarted; the new priorityClassName only needs to take effect on newly created Pods, With this feature, we can support the following scenarios:

  1. If the job is in the submitted state, all Pods can use the new priorityClassName.
  2. If the job is in the running state and has already obtained all its Pods, an external controller can choose to apply the new priorityClassName by deleting Pods so that the newly created Pods pick up the new value. If the job has not yet obtained all Pods, you can also choose to let only newly created Pods use the new priorityClassName. The exact behavior can be adjusted based on the business scenario. like
  • low -> high, we can delete pending pod manual let high priorityClassName take effect on new pod, this is a faster way to obtain resources .
  • high- > low, we can delete all running pod manual ,release high priorityClassName pod and apply low priorityClassName pod requeue.

This capability is already being used in our production environment and works correctly.

houyuting avatar Dec 12 '25 10:12 houyuting

The driver does not need to reload the configuration because priorityClassName takes effect at the Pod level.

Yes, your are right. I just relized that the priorityClassName of executor pods are mutated by the webhook:

https://github.com/kubeflow/spark-operator/blob/a531c93bdf9b449a55e46748a9106f6f2dc3fc1b/internal/webhook/sparkpod_defaulter.go#L534-L550

ChenYi015 avatar Dec 12 '25 11:12 ChenYi015

The driver does not need to reload the configuration because priorityClassName takes effect at the Pod level.

Yes, your are right. I just relized that the priorityClassName of executor pods are mutated by the webhook:

https://github.com/kubeflow/spark-operator/blob/a531c93bdf9b449a55e46748a9106f6f2dc3fc1b/internal/webhook/sparkpod_defaulter.go#L534-L550 hi @ChenYi015 The priorityClassName passed in through the parameter app *v1beta2.SparkApplication in this piece of code is already the updated value.

houyuting avatar Dec 15 '25 03:12 houyuting