yunikorn-k8shim
yunikorn-k8shim copied to clipboard
[YUNIKORN-802] Supports to assign nodes to non-default partition
What is this PR for?
see comment (https://issues.apache.org/jira/browse/YUNIKORN-22?focusedCommentId=17398860&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17398860)
Currently, all nodes are hardcode to be assigned to "default" partition. That brings two disadvantages.
- we can't select specify nodes, which are used to execute spark job only, from a cluster
- multi-partitions does not work since non-default partition can't get nodes
What type of PR is it?
- [ ] - Bug Fix
- [ ] - Improvement
- [x] - Feature
- [ ] - Documentation
- [ ] - Hot Fix
- [ ] - Refactoring
Todos
- support to change partition assignment of existent node (in this PR, the update request will be skipped)
- support to remove existent node which had been reassigned (in this PR, removing such node cause error message "Failed to update non existing node ...")
What is the Jira issue?
https://issues.apache.org/jira/browse/YUNIKORN-802
How should this be tested?
Screenshots (if appropriate)
Questions:
- [ ] - The licenses files need update.
- [ ] - There is breaking changes for older versions.
- [ ] - It needs documentation.
Codecov Report
Merging #293 (ce9e427) into master (c47ed51) will increase coverage by
3.51%
. The diff coverage is78.36%
.
@@ Coverage Diff @@
## master #293 +/- ##
==========================================
+ Coverage 59.75% 63.27% +3.51%
==========================================
Files 35 37 +2
Lines 3133 3455 +322
==========================================
+ Hits 1872 2186 +314
- Misses 1180 1182 +2
- Partials 81 87 +6
Impacted Files | Coverage Δ | |
---|---|---|
pkg/appmgmt/appmgmt_recovery.go | 67.50% <0.00%> (-8.18%) |
:arrow_down: |
pkg/cache/amprotocol_mock.go | 0.00% <0.00%> (ø) |
|
pkg/cache/context_recovery.go | 45.23% <0.00%> (-1.11%) |
:arrow_down: |
pkg/cache/external/scheduler_cache.go | 34.56% <0.00%> (+0.40%) |
:arrow_up: |
pkg/common/events/recorder_mock.go | 0.00% <0.00%> (ø) |
|
pkg/common/events/states.go | 0.00% <0.00%> (ø) |
|
pkg/controller/application/app_controller.go | 71.05% <ø> (-0.26%) |
:arrow_down: |
...missioncontrollers/webhook/admission_controller.go | 33.74% <0.00%> (+1.00%) |
:arrow_up: |
pkg/shim/main.go | 0.00% <ø> (ø) |
|
pkg/common/events/recorder.go | 33.33% <40.00%> (+3.33%) |
:arrow_up: |
... and 28 more |
Continue to review full report at Codecov.
Legend - Click here to learn more
Δ = absolute <relative> (impact)
,ø = not affected
,? = missing data
Powered by Codecov. Last update 0bd9366...ce9e427. Read the comment docs.
Lots has changed since this was started. I still think this is a good idea to fix. There are other changes going in for YUNIKORN-1090 and YUNIKORN-1091, I think we need to clean that up before we restart this one.
Hi. Is there any further progress by any chance? I'm waiting as this feature is very useful and needed.
Hi. Is there any further progress by any chance? I'm waiting as this feature is very useful and needed.
hi @chia7712 are you still actively working on this ticket? If not, i can help the work.
@lixmgl feel free to take over it. thanks :)
@lixmgl feel free to take over it. thanks :)
Will do :)
This is a very stale PR at this point - I also think we need a design doc on this before proceeding. See comments on https://issues.apache.org/jira/browse/YUNIKORN-22.
Any progress?
@heumsi multiple partition support needs an extensive design doc before we can proceed as there are many complications, especially with Kubernetes. As far as I know, the design work has not been completed yet.
Hi @heumsi, in YuniKorn, if we only use labels to group nodes into different partitions and use labels to assign applications to different nodes, I think this can be achieved by nodeSelector. Not sure whether you have other scenarios which must be done with different partitions in YuniKorn. If yes, could you share in the thread? Thank you!
Does it make sense to have this PR open? We don't even have SchedulerNode
and nodes.go
anymore. This change needs a complete rewrite, but as mentioned above, we first need a design doc, which doesn't exist.
I suggest closing this. cc @chia7712
Does it make sense to have this PR open? We don't even have
SchedulerNode
andnodes.go
anymore. This change needs a complete rewrite, but as mentioned above, we first need a design doc, which doesn't exist.I suggest closing this. cc @chia7712
I would go so far as to close YUNIKORN-802 and the parent umbrella YUNIKORN-22. This has been open for years, and no one has yet put forth a coherent design for doing partitions in Kubernetes. I'm pretty convinced its not possible without breaking the whole thing.
@pbacsko @craigcondit agreed. let's me cleanup related tickets :)