[WIP] Improve Copilot
Tracking issue
Why are the changes needed?
When running ContainerTask, we discovered that the pod stuck on uploader container for a long time after the main task succeed. The uploader should not take so much time. This is because uploader sidecar starts after the task container finishes, so that it cannot detect any active process (the main container already completed), and it waits until timeout (1m40s).
What changes were proposed in this pull request?
Convert the uploader container from "main container" to "sidecar container" (which is the init container with restartPolicy set to Always). This has following benefit:
- Ensure the
uploadercontainer starts before main container - We can trap the
SIGTERMsignal to know when the main container completed. Therefore, we do not need to check if the main container start- Ref: https://kubernetes.io/docs/tutorials/configuration/pod-sidecar-containers/#benefits-of-a-built-in-sidecar-container
In the PR, I made following modification:
- Change uploader container to "sidecar contianer"
- Add a new watcher:
SignalWatcher - Remove the
WaitToStartlogic for uploader - Use
WaitToExitin Signal Watcher to check if the main container completed - Remove kube api watcher, file watcher
- Let copilot use the logger level set in config
How was this patch tested?
Labels
Please add one or more of the following labels to categorize your PR:
- added: For new features.
- changed: For changes in existing functionality.
- deprecated: For soon-to-be-removed features.
- removed: For features being removed.
- fixed: For any bug fixed.
- security: In case of vulnerabilities
This is important to improve the readability of release notes.
Setup process
Screenshots
With my modification, it only takes 3 seconds to upload the result:
Check all the applicable boxes
- [ ] I updated the documentation accordingly.
- [ ] All new and existing tests passed.
- [x] All commits are signed-off.
Related PRs
Docs link
Bito Automatic Review Skipped - Draft PR
Bito didn't auto-review because this pull request is in draft status.
No action is needed if you didn't intend for the agent to review it. Otherwise, to manually trigger a review, type /review in a comment and save.
You can change draft PR review settings here, or contact your Bito workspace admin at [email protected].
Codecov Report
Attention: Patch coverage is 35.48387% with 20 lines in your changes missing coverage. Please review.
Project coverage is 58.67%. Comparing base (
9c536fb) to head (d483259). Report is 2 commits behind head on master.
| Files with missing lines | Patch % | Lines |
|---|---|---|
| ...lytecopilot/cmd/containerwatcher/signal_watcher.go | 0.00% | 20 Missing :warning: |
Additional details and impacted files
@@ Coverage Diff @@
## master #6501 +/- ##
==========================================
+ Coverage 58.52% 58.67% +0.14%
==========================================
Files 940 938 -2
Lines 71532 71361 -171
==========================================
+ Hits 41864 41869 +5
+ Misses 26489 26313 -176
Partials 3179 3179
| Flag | Coverage Δ | |
|---|---|---|
| unittests-datacatalog | 59.03% <ø> (ø) |
|
| unittests-flyteadmin | 56.25% <ø> (ø) |
|
| unittests-flytecopilot | 39.56% <0.00%> (+8.56%) |
:arrow_up: |
| unittests-flytectl | 64.72% <ø> (ø) |
|
| unittests-flyteidl | 76.12% <ø> (ø) |
|
| unittests-flyteplugins | 61.09% <100.00%> (+0.01%) |
:arrow_up: |
| unittests-flytepropeller | 54.83% <ø> (ø) |
|
| unittests-flytestdlib | 64.02% <ø> (ø) |
Flags with carried forward coverage won't be shown. Click here to find out more.
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
:rocket: New features to boost your workflow:
- :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
Bito Automatic Review Skipped - Draft PR
Bito didn't auto-review because this pull request is in draft status.
No action is needed if you didn't intend for the agent to review it. Otherwise, to manually trigger a review, type /review in a comment and save.
You can change draft PR review settings here, or contact your Bito workspace admin at [email protected].
Bito Automatic Review Skipped - Draft PR
Bito didn't auto-review because this pull request is in draft status.
No action is needed if you didn't intend for the agent to review it. Otherwise, to manually trigger a review, type /review in a comment and save.
You can change draft PR review settings here, or contact your Bito workspace admin at [email protected].
Bito Automatic Review Skipped - Draft PR
Bito didn't auto-review because this pull request is in draft status.
No action is needed if you didn't intend for the agent to review it. Otherwise, to manually trigger a review, type /review in a comment and save.
You can change draft PR review settings here, or contact your Bito workspace admin at [email protected].
Check the behaviour when main container OOM, the sidecar container will still get SIGTERM and upload output if exists
Also removed the shared namespace watcher as this watcher does not work in the current approach. It will be terminated by SIGTERM
This is great will simplify copilot a lot, make it faster and more secure
Sorry for the late reply!
I just updated the description to include the test script I used and the new result screenshot
Thanks!