scheduler-plugins
scheduler-plugins copied to clipboard
Initial commit Network-Aware framework
What type of PR is this?
This is the initial PR for the network-aware framework approved in this KEP.
The PR includes two custom resources (AppGroup and NetworkTopology), three plugins (Networkaware), and one controller (AppGroup).
/kind feature
What this PR does / why we need it:
This proposal introduces an end-to-end solution to model/weight a cluster's network latency and topological information, and leverage that to better schedule latency- and bandwidth-sensitive workloads. Further information is available in our merged KEP.
Special notes for your reviewer:
After several interactions in the KEP PR, we have updated our design and we submit the first version of the proposed network-aware framework. All feedback is welcome. Thanks.
Cc: @Huang-Wei
Hi @jpedro1992. Thanks for your PR.
I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test
on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.
Once the patch is verified, the new status will be reflected by the ok-to-test
label.
I understand the commands that are listed here.
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
cc: @ravisantoshgudimetla @ingvagabund and @damemi
/ok-to-test
/retest
The integration tests fail because of the "api-approved.kubernetes.io" in the two added CRDs (AppGroup & NetworkTopology). I was able to run my integration tests successfully locally after I manually edited the "api-approved.kubernetes.io". So, how should I proceed to take care of this issue for this PR since both CRDs are not approved yet? What is the proper way to add novel CRDs and to solve the issue raised by "api-approved.kubernetes.io"? Thanks.
CRD error:
After manually editing "api-approved.kubernetes.io":
So, how should I proceed to take care of this issue for this PR since both CRDs are not approved yet? What is the proper way to add novel CRDs and to solve the issue raised by "api-approved.kubernetes.io"? Thanks.
@jpedro1992 you can just check in changes to include the "api-approved.kubernetes.io" fields, just like other CRDs do. Don't forget to append # edited manually
in the end.
So, how should I proceed to take care of this issue for this PR since both CRDs are not approved yet? What is the proper way to add novel CRDs and to solve the issue raised by "api-approved.kubernetes.io"? Thanks.
@jpedro1992 you can just check in changes to include the "api-approved.kubernetes.io" fields, just like other CRDs do. Don't forget to append
# edited manually
in the end.
@Huang-Wei Thank you for getting back to me.
Currently, I have the following "api-approved.kubernetes.io" field in the AppGroup and NetworkTopology CRDs:
api-approved.kubernetes.io: https://github.com/kubernetes-sigs/scheduler-plugins/pull/348 # edited manually
I run hack/verify-crdgen.sh
and it still fails. Then, if I manually edit the CRDs in the manifests/crds
folder with the "api-approved.kubernetes.io" fields, the integration tests pass:
Should I amend my commit with this version, or is there any other way to solve this issue?
@jpedro1992 I got some suggestions from other reviewers to split the big PR into several commits. E.g. CRD/data type definitions + clientsets, Sorting + Filtering + Scoring implementation, examples, etc.
Should I amend my commit with this version
yes, please do.
@wangchen615 's suggestion is valid. It'd be very helpful for reviewing if you can split this PR into 2 smaller ones:
- APIs, including versioned/unversioned type definitions, validation, defaulting, conversion and tests (if needed)
- Pugin implementations and tests (UT and integration)
Hi @Huang-Wei,
I believe I solved the issues with the integration tests. Also, I have reduced the size of the PR:
- CRDs + API definitions
Perhaps better to wait on the revision of this commit before pushing the Plugin Implementations + Tests.
Thanks
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied - After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied - After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closed
You can:
- Mark this issue or PR as fresh with
/remove-lifecycle stale
- Mark this issue or PR as rotten with
/lifecycle rotten
- Close this issue or PR with
/close
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
/remove-lifecycle stale
Please let me know if I can provide further help/info on the PR revision.
It'd be very helpful for reviewing if you can split this PR into 2 smaller ones:
APIs, including versioned/unversioned type definitions, validation, defaulting, conversion and tests (if needed) Plugin implementations and tests (UT and integration)
Both points are not relevant until a plugin is introduced given this PR currently introduces only CRDs and a controller. Since CRD types under https://github.com/kubernetes-sigs/scheduler-plugins/tree/master/apis/scheduling have their own validation/defaulting.
@Huang-Wei is there any policy against introducing new CRDs and their corresponding controllers ahead of plugins? The AppGroup type and its controller will probably have no use beside recomputing the topological sorting. On the other hand, is the idea of this repository to always provide fully functioning plugins? Or, rather allowing the community to also continuously accept controllers which provide selected/pre-computed view of the world? So consumers of the repository can build the scheduling framework as combination of already existing plugins/controllers + other custom plugins/controllers not yet merged in the repository.
Both points are not relevant until a plugin is introduced given this PR currently introduces only CRDs and a controller. Since CRD types under
master
/apis/scheduling have their own validation/defaulting.
IIRC, that comment was made when the PR was combining all things together.
is there any policy against introducing new CRDs and their corresponding controllers ahead of plugins?
To make this repo sustainable (given limited reviewers' cycles), we have to use our bandwidth efficiently and focus on scheduling bits. So, I'm a bit inclined to have AppGroup and its controllers resided temporarily in its own repo, and moved back (if necessary) once it gets one or two release cycles to soak. Even if it's hosted outside, we can still compile it into our controller image. (it's just we need to be cautious to avoid cyclic dependency) In this way, @jpedro1992 you can evolve API and controller at your own pace. And in this repo, we focus on scheduling portion for now so that we won't be the blocker to see this plugin's debut in scheduler-plugins. WDYT?
BTW: an example is that NodeResourceTopology is hosted outside, but it doesn't mean we cannot build the scheduler plugin here.
Hi @ingvagabund and @Huang-Wei,
Thank you for your revision! I will be working on the suggested code modifications.
For me it's fine if we keep the CRDs + controllers outside of sig-scheduling for now. Is there any guide / example I could follow from another plugin to know exactly how to host the components outside of sig-scheduling and build them inside?
I cannot just simply remove the CRDs + controllers and add the plugins since they require the AppGroup CRD.
Thanks in advance!
[APPROVALNOTIFIER] This PR is NOT APPROVED
This pull-request has been approved by: jpedro1992
Once this PR has been reviewed and has the lgtm label, please assign denkensk for approval by writing /assign @denkensk
in a comment. For more information see:The Kubernetes Code Review Process.
The full list of commands accepted by this bot can be found here.
Approvers can indicate their approval by writing /approve
in a comment
Approvers can cancel approval by writing /approve cancel
in a comment
Hi @ingvagabund and @Huang-Wei,
We decided to host the multiple components in the following repo: Diktyo-io
Regarding the PR, do you prefer to do the modifications in this PR (remove CRDs + APIs and replace it with plugins) or open a new PR with the plugins?
Thanks in advance!
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied - After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied - After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closed
You can:
- Mark this issue or PR as fresh with
/remove-lifecycle stale
- Mark this issue or PR as rotten with
/lifecycle rotten
- Close this issue or PR with
/close
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
/close in favor of #432
/close in favor of #432
I agree.