training-operator icon indicating copy to clipboard operation
training-operator copied to clipboard

Extend the PodTemplateOverrides structure with "manager" field

Open mimowo opened this issue 5 months ago • 20 comments

What you would like to be added?

I would like to extend the PodSetOverrides for the optional "manager" field.

This is needed for the integration with Kueue done properly.

When Kueue starts a Job it is using the PodSetOverrides to set NodeLabels, Tolerations, based on the selected ResourceFlavor. Then, on suspending the Job Kueue needs to "revert" (remove) the overrides.

However, it is not clear from Kueue perspecive which overrides it did, and which were done by a user, or another controller. So, to overcome this we temporarily have a workaround by injecting the "kueue.x-k8s.io/override-idx" annotation which indicates the modified "override" by Kueue.

Why is this needed?

To make the interface between the Kueue and TrainJob easier to understand.

Probably other controllers or users who modify the "PodSetOverrides" could also benefit by annotating "who" owns the override.

Love this feature?

Give it a 👍 We prioritize the features with most 👍

mimowo avatar Sep 25 '25 08:09 mimowo

cc @tenzen-y @andreyvelich @astefanutti @kaisoz

This is also briefly discussed here: https://github.com/kubernetes-sigs/kueue/issues/6865#issuecomment-3329173383

mimowo avatar Sep 25 '25 08:09 mimowo

Actually, now I wonder maybe this is not necessarily needed if we could introspect the field manager by the TrainJob .ManagedFields. This should generally keep the actor already, but I don't have much experience with that.

Also, embedding into spec would make it clear when debugging.

However, this is potentially another path worth exploring. cc @kaisoz

mimowo avatar Sep 25 '25 08:09 mimowo

x-referencing the issue at the Kueue side: https://github.com/kubernetes-sigs/kueue/issues/7000

mimowo avatar Sep 25 '25 08:09 mimowo

Actually, now I wonder maybe this is not necessarily needed if we could introspect the field manager by the TrainJob .ManagedFields. This should generally keep the actor already, but I don't have much experience with that.

@mimowo yes that would also be my intuition. The changes applied by Kueue should be available in the managed fields of the TrainJob.

Maybe helper functions like this one be helpful: https://pkg.go.dev/k8s.io/apimachinery/pkg/util/managedfields#ExtractInto

astefanutti avatar Sep 25 '25 14:09 astefanutti

AFAIK, in case of client-side update, the field could be overridden by another field manager w/o strict mode. But in case of SSA, the field could not be overriddeden by another field manager w/o strict mode.

So, in case of both, the field could be updated and owner could be stolen by another one. I think that it will probably be problematic.

tenzen-y avatar Sep 25 '25 15:09 tenzen-y

Maybe it is not problematic that much. First of all it would mean buggy integration between two orchestrators. Second it seems preventable if the Override is immutable, then the second orchestrator sending Patch or SSA would be validated against the mutation.

mimowo avatar Sep 25 '25 15:09 mimowo

First of all it would mean buggy integration between two orchestrators

I would also tend to consider it a bug.

astefanutti avatar Sep 25 '25 16:09 astefanutti

That's a good point. In that case, we have 2 approaches:

  1. Introducing immutable restrictions to PodSpecOverride while TrainJob is not suspended.
  2. Introduce managed field to each PodSpecOverride field like podSpecOverride[*].spec.fieldManager

I'm ok with either approach.

tenzen-y avatar Sep 26 '25 14:09 tenzen-y

We found that the TrainJob admission webhook rejects TrainJobs with more than one PodSpecOverride targeting the same job. This effectively breaks the integration if the user supplied TrainJob already contains a podSpecOverride. I created an issue in the Trainer repo to discuss it: https://github.com/kubeflow/trainer/issues/2871

kaisoz avatar Oct 01 '25 21:10 kaisoz

We found that the TrainJob admission webhook rejects TrainJobs with more than one PodSpecOverride targeting the same job. This effectively breaks the integration if the user supplied TrainJob already contains a podSpecOverride. I created an issue in the Trainer repo to discuss it: #2871

This has been solved and merged into master in #2880

kaisoz avatar Oct 08 '25 13:10 kaisoz

Actually, now I wonder maybe this is not necessarily needed if we could introspect the field manager by the TrainJob .ManagedFields. This should generally keep the actor already, but I don't have much experience with that.

Also, embedding into spec would make it clear when debugging.

However, this is potentially another path worth exploring. cc @kaisoz

Unfortunately, we can't use the ManagedFields here because it only shows that Kueue modified the podSpecOverrides, not thet values it added:

- apiVersion: trainer.kubeflow.org/v1alpha1
    fieldsType: FieldsV1
    fieldsV1:
      f:metadata:
        f:annotations:
          f:kueue.x-k8s.io/trainjob-override-idx: {}
      f:spec:
        f:podSpecOverrides: {}
        f:suspend: {}
    manager: kueue
    operation: Update
    time: "2025-10-02T13:34:49Z"

So it seems we'll need to go with the "manager" field in the PodSpecOverrides. I'll take care of this if there's no objection

/assign

kaisoz avatar Oct 08 '25 13:10 kaisoz

@kaisoz that may be because the PodSpecOverrides field has the +listType=atomic merge strategy.

It may be possible to change it to a granular type for which I would expect the individual updates tracked by managers:

https://kubernetes.io/docs/reference/using-api/server-side-apply/#merge-strategy

astefanutti avatar Oct 08 '25 13:10 astefanutti

Ah, worth checking, though it can be still a bit tricky, as the ManagedFields will probably look different after setting the first item by the first orchestrator (initializing the podTemplateOverrides structure), and differently when new items are added, but worth checking.

mimowo avatar Oct 10 '25 10:10 mimowo

/retitle Extend the PodTemplateOverrides structure with "manager" field Adjusting after https://github.com/kubeflow/trainer/pull/2785

mimowo avatar Oct 10 '25 10:10 mimowo

@mimowo: Re-titling can only be requested by trusted users, like repository collaborators.

In response to this:

/retitle Extend the PodTemplateOverrides structure with "manager" field Adjusting after https://github.com/kubeflow/trainer/pull/2785

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

google-oss-prow[bot] avatar Oct 10 '25 10:10 google-oss-prow[bot]

@kaisoz that may be because the PodSpecOverrides field has the +listType=atomic merge strategy.

I think this is a great idea on its own, we could start here.

mimowo avatar Oct 15 '25 15:10 mimowo

/area api /remove-label lifecycle/needs-triage

andreyvelich avatar Oct 15 '25 15:10 andreyvelich

As I synced with @kaisoz he was looking into this already and apparently when using +listType=map it requires specifying +listMapKey, which uniquely identifies each element of the list. However, in case of the podTemplateOverrides I think we don't have a good candidate field currently.

However, since in 2.1.0 we will have "metadata" included in the podTemplateOverrides we could use this structure to mark "managed-by-kueue". Still, it seems having the dedicated "manager" field is the cleanest approach.

mimowo avatar Oct 15 '25 15:10 mimowo

As I synced with @kaisoz he was looking into this already and apparently when using +listType=map it requires specifying +listMapKey, which uniquely identifies each element of the list. However, in case of the podTemplateOverrides I think we don't have a good candidate field currently.

exactly, listType=map requires a listMapKey which must be a scalar (string, int, etc..) and there's none in podTemplateOverrides. If you provide a non-scalar field it just doesn't build

kaisoz avatar Oct 15 '25 15:10 kaisoz

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

github-actions[bot] avatar Jan 13 '26 20:01 github-actions[bot]

/milestone v2.2

andreyvelich avatar Jan 21 '26 23:01 andreyvelich