feat: Add the manager field to the podTemplateOverride object
What this PR does / why we need it:
This PR adds a manager field to the podTemplateOverrides object, indicating who created (owns) the object. This is useful for external controllers such as Kueue to identify which entries they added. This manager field is optional and is defaulted by a mutating webhook, which sets the request username for podTemplateOverrides that donโt have a manager.
Which issue(s) this PR fixes (optional, in Fixes #<issue number>, #<issue number>, ... format, will close the issue(s) when PR gets merged):
Fixes #2856
Checklist:
- [ ] Docs included if any changes are user facing
Pull Request Test Coverage Report for Build 21167022709
Details
- 14 of 25 (56.0%) changed or added relevant lines in 2 files are covered.
- No unchanged relevant lines lost coverage.
- Overall coverage increased (+0.1%) to 51.495%
| Changes Missing Coverage | Covered Lines | Changed/Added Lines | % |
|---|---|---|---|
| pkg/webhooks/trainjob_webhook.go | 14 | 19 | 73.68% |
| pkg/util/cert/cert.go | 0 | 6 | 0.0% |
| <!-- | Total: | 14 | 25 |
| Totals | |
|---|---|
| Change from base Build 21161934063: | 0.1% |
| Covered Lines: | 1257 |
| Relevant Lines: | 2441 |
๐ - Coveralls
cc @tenzen-y @andreyvelich @astefanutti @mimowo
I also tested this manually in kind by deploying a Trainjob with a podTemplateOverride entry in a cluster with Kueue installed. The mutating webhook added manager=kubernetes-admin to my podTemplateOverride object, and manager=system:serviceaccount:kueue-system:kueue-controller-manager (the Kueue service account) to the ones added by Kueue.
The failing test seems to be a flake (Failed executing command with error: build process: clone golangci-lint: git clone --branch v2.4.0 --single-branch --depth 1 -c advice.detachedHead=false -q https://github.com/golangci/golangci-lint.git: exit status 128) Could anyone re-run it for me? Thanks!
/retest
/retest
I thought I couldn't do this here since the repo uses GH actions... ๐คฆ๐ป
/retest
@kaisoz: Cannot trigger testing until a trusted user reviews the PR and leaves an /ok-to-test message.
In response to this:
/retest
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
/retest
@kaisoz: Cannot trigger testing until a trusted user reviews the PR and leaves an /ok-to-test message.
In response to this:
/ok-to-test
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
@kaisoz: Cannot trigger testing until a trusted user reviews the PR and leaves an /ok-to-test message.
In response to this:
/retest
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
@kaisoz: Cannot trigger testing until a trusted user reviews the PR and leaves an
/ok-to-testmessage.
@astefanutti prow doesn't trust me ๐ . I did run the clone line in machine and it works, I expect it to work now in CI..
/ok-to-test
/restest
/retest
@kaisoz there is an issue with CI that should be fixed after you rebase this on top of #3023.
@kaisoz there is an issue with CI that should be fixed after you rebase this on top of #3023.
great @astefanutti ! Thanks! Then I think I'll wait for #3023 to be merged and then rebase ๐
@kaisoz #3023 has been merged if you want to rebase on top.
/retest
@astefanutti all green now! PTAL whenever you have time. Thanks! ๐
@kaisoz that looks good to me overall, thanks!
I understand the webhook would automatically set the manager field to "mimic" SSA behavior, but just to be sure we don't introduce too much complexity nor "re-invent" the wheel, would having Kueue set the manager field directly in the PodTemplateOverride be working just as good?
Hey @astefanutti thanks for the review! Yes! it does mimic SSA but a more fine-grained entry level. And I agree, Kueue can just set the manager field directly, but having a mutating webhook adds some benefits:
- Allows the manager field to be optional. Users don't need to manually set the
manager, the webhook does it automatically - Allows to set
+listMap=mapwith+listMapKey=managerso that entries are modified individually (also comes with better observability becausemanagedFieldswill show what entries were modified instead of the whole list)
2 needs a stable key, that's where the webhook comes into play (it ensures that the field will never be empty). However, I just thought that for better integrity, maybe the manager field should always be set by the webhook so that a user can't claim a controller identity...
WDYT? If you still think it's too much I can drop the webhook but I still think it adds some benefits ๐
@kaisoz thanks, that makes sense. The complexity is reasonable enough and it has some benefits as you've highlighted.
@andreyvelich @tenzen-y do you have any opinions?
I like the idea of defaulting the manager to req.UserInfo.Username if it is empty, however will it introduce any problems in the future if we decide to make PodTemplateOverrides mutable for TrainJob?
I like the idea of defaulting the manager to req.UserInfo.Username if it is empty, however will it introduce any problems in the future if we decide to make PodTemplateOverrides mutable for TrainJob?
I might be missing something, but PodTemplateOverrides are already mutable when a TrainJob is suspended. Is there an additional kind of mutability youโre referring to here?
With this implementation, manager is only defaulted at creation time if it is unset. On updates, the existing value is preserved. That said, this still allows any user to explicitly change manager and claim an entry that was originally set by a controller. To avoid this, one option would be to make the field immutable and disallow changes after creation.
I might be missing something, but PodTemplateOverrides are already mutable when a TrainJob is suspended. Is there an additional kind of mutability youโre referring to here?
Do we know if Kueue requires to have this field mutable for suspended TrainJobs?
Do we know if Kueue requires to have this field mutable for suspended TrainJobs?
Yes, before resuming a job, Kueue injects its scheduling info via PodTemplateOverrides. With this change, Kueue would either set the appropriate manager on each newly added entry, or leave it empty so the webhook can default it to its system account.
[APPROVALNOTIFIER] This PR is NOT APPROVED
This pull-request has been approved by: Once this PR has been reviewed and has the lgtm label, please ask for approval from tenzen-y. For more information see the Kubernetes Code Review Process.
The full list of commands accepted by this bot can be found here.
Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment