[YUNIKORN-1566] Handle leaf-parent conversion with running applications
What is this PR for?
Applying a new config for a queue should be an all-or-nothing operation: a failure must not leave the hierarchy in with inconsistent configuration.
What type of PR is it?
- [ ] - Bug Fix
- [x] - Improvement
- [x] - Feature
- [ ] - Documentation
- [ ] - Hot Fix
- [ ] - Refactoring
Todos
- [ ] - Task
What is the Jira issue?
https://issues.apache.org/jira/browse/YUNIKORN-1566
How should this be tested?
Screenshots (if appropriate)
Questions:
- [ ] - The licenses files need update.
- [ ] - There is breaking changes for older versions.
- [ ] - It needs documentation.
Codecov Report
Merging #502 (c0aff34) into master (993f07f) will increase coverage by
0.15%. The diff coverage is75.90%.
@@ Coverage Diff @@
## master #502 +/- ##
==========================================
+ Coverage 74.84% 75.00% +0.15%
==========================================
Files 71 73 +2
Lines 11537 11945 +408
==========================================
+ Hits 8635 8959 +324
- Misses 2597 2664 +67
- Partials 305 322 +17
| Impacted Files | Coverage Δ | |
|---|---|---|
| pkg/common/configs/config.go | 100.00% <ø> (ø) |
|
| pkg/scheduler/policies/delete_policy.go | 0.00% <0.00%> (ø) |
|
| pkg/scheduler/partition.go | 76.33% <41.66%> (-0.90%) |
:arrow_down: |
| pkg/common/configs/configvalidator.go | 88.53% <57.14%> (-0.52%) |
:arrow_down: |
| pkg/scheduler/objects/queue.go | 76.64% <60.00%> (-0.26%) |
:arrow_down: |
| pkg/scheduler/ugm/manager.go | 65.13% <75.00%> (+6.23%) |
:arrow_up: |
| pkg/scheduler/objects/queue_moveapp.go | 79.54% <79.54%> (ø) |
|
| pkg/scheduler/ugm/queue_tracker.go | 86.61% <98.24%> (+5.17%) |
:arrow_up: |
| pkg/scheduler/ugm/group_tracker.go | 100.00% <100.00%> (ø) |
|
| pkg/scheduler/ugm/user_tracker.go | 100.00% <100.00%> (ø) |
... and 1 file with indirect coverage changes
:mega: We’re building smart automated test selection to slash your CI/CD build times. Learn more
Looking through the changes I am not too worried about the new config not being OK to apply. The config validation looks at the new config and makes sure it all parses correctly and that the structure is valid. If we have a gap in that area we should update the config validator. BTW: I think we are missing the template validation against the queue it is defined on. We also copy the template to a parent queue below the point it is defined without validating.
The new configured values should be applied recursively from the top down so that should all work.
What I am worried about is that the new config does not map correctly to the existing in memory config. These are some of the cases I think we do not check for:
- the in memory config shows a queue
root.parent.leafThe newly created config has only gotroot.parent. Do we remove the queueleaf? Parent does not change type to a leaf and it was an auto parent before. How do we handle this... - the in memory config shows
root.my-queuedefined as a leaf. The newly created config hasroot.my-queue.spark. That as far as I can tell causes an automatic type change to parent for my-queue. What happens with the applications that run there? - the in memory config shows
root.my-queue.sparkBothmy-queueandsparkare dynamic queues. The newly created config hasroot.my-queueas a leaf. That breaks things badly I think.
Looking through the changes I am not too worried about the new config not being OK to apply. The config validation looks at the new config and makes sure it all parses correctly and that the structure is valid. If we have a gap in that area we should update the config validator.
I checked the config validator. Yes, resource and ACL strings can cause problems and those are validated.
The method Queue.setTemplate() can return an error and it's not validated, however if this fails we swallow it and return a nil without printing anything. This is suspect.
- the in memory config shows a queue
root.parent.leafThe newly created config has only gotroot.parent. Do we remove the queueleaf? Parent does not change type to a leaf and it was an auto parent before. How do we handle this...
By default, leaf is marked for removal and goes into Draining state (if it is a managed queue). Applications in that queue keep running. In case of a parent, the entire hierarchy below it is marked for removal.
- the in memory config shows
root.my-queuedefined as a leaf. The newly created config hasroot.my-queue.spark. That as far as I can tell causes an automatic type change to parent for my-queue. What happens with the applications that run there?
I think that is what we partly discussed in the JIRA ticket. We can create a new queue like root.preserved and move existing applications into that queue. Plus we have to update queue references in the Application objects. I can do a POC in this PR and let's see if it is any good.
- the in memory config shows
root.my-queue.sparkBothmy-queueandsparkare dynamic queues. The newly created config hasroot.my-queueas a leaf. That breaks things badly I think.
After some investigation, it looks like the dynamic queues are automatically cleaned up by the partition manager as soon as they become empty. Other than that, I don't see special treatment. I suggest the same approach, running applications should be moved to a pre-defined queue.
Another thing that is important that asks and allocations cause the priorities to be updated and re-calculated. Basically, we also need to do two things:
- Make sure that no preemption/scheduling cycle runs during the config update
- Recalculate all priorities if there was a change
Update: pushed an entirely different approach. It's likely that the JIRA ticket needs to be renamed.
It's reviewable, but it's still in a WIP condition, there are no unit tests. If the leaf is changed to a parent, we move the running applications to a special queue named "root.preserved" and update a bunch of things, eg. Queue object inside the applications, pending resources, running apps, allocated resources, etc.
The entire hierarchy up to root is updated, except root itself. Basically, a certain metric is subtracted from the parent queues and added to "root.preserved". From the perspective of the root queue, nothing changes.
New code is added to a file called "queue_moveapp.go", I believe this part is so specific that it deserves a new place. Later on, moving an application can be generalized.
Deleting leaf queues pose a question, what to do with those? I added a deletion policy: we can keep the old behavior (Drain) or move the applications (Preserve).
Closing this. Way too many changes came in since the last update.