fix: move nodeclaim check to right place
Reason for Change:
The PR mainly did two things:
- move nodeclaim check to right place. After creating nodeclaim, we should check the status in following reconcile, rather than polling to check the status.
- Fix a bug: This code
return reconcile.Result{Requeue: false}, erractually cant stop reconciliation, because when error is not nil, thereconcile.Result{Requeue: false}will be ignored. We should usereconcile.TerminalErrorRequirements
- [ ] added unit tests and e2e tests (if applicable).
Issue Fixed:
Notes for Reviewers:
Title
(Describe updated until commit https://github.com/kaito-project/kaito/commit/0cb969fdf8e529bd326cf17c513473b4c7ee6ef6)
Improve NodeClaim Handling and Error Management
Description
-
Moved nodeclaim check to post-creation reconcile
-
Replaced incorrect error handling with
reconcile.TerminalError -
Simplified nodeclaim creation logic by removing unnecessary retries
-
Updated tests to reflect changes
Changes walkthrough 📝
| Relevant files | |||||
|---|---|---|---|---|---|
| Bug fix |
| ||||
| Enhancement |
| ||||
| Tests |
|
Need help?
Type /help how to ...in the comments thread for any questions about PR-Agent usage.Check out the documentation for more information.
PR Reviewer Guide 🔍
(Review updated until commit https://github.com/kaito-project/kaito/commit/0cb969fdf8e529bd326cf17c513473b4c7ee6ef6)
Here are some key observations to aid the review process:
| ⏱️ Estimated effort to review: 3 🔵🔵🔵⚪⚪ |
| 🧪 No relevant tests |
| 🔒 No security concerns identified |
⚡ Recommended focus areas for reviewPossible Issue
|
PR Code Suggestions ✨
Latest suggestions up to 0cb969f
Explore these optional code suggestions:
| Category | Suggestion | Impact |
| Possible issue |
Add error handling for unavailable instance typesHandle the case where the node claim creation fails due to unavailable instance pkg/utils/nodeclaim/nodeclaim.go [246-255]
Suggestion importance[1-10]: 8__ Why: The suggestion addresses a critical issue by adding error handling for unavailable instance types, which can prevent unnecessary retries and improve the robustness of the code. | Medium |
Previous suggestions
Suggestions up to commit d60224c
| Category | Suggestion | Impact |
| Possible issue |
Add error handling after node creationHandle the case where the node class check fails and ensure proper error handling. pkg/utils/nodeclaim/nodeclaim.go [246-256]
Suggestion importance[1-10]: 7__ Why: The suggestion adds error handling after attempting to create a node claim, which is a good practice to ensure that any errors during creation are properly managed and returned. | Medium |
Suggestions up to commit 894f8ec
| Category | Suggestion | Impact |
| Possible issue |
Add error handling after node creationHandle the case where the node class check fails and ensure proper error handling. pkg/utils/nodeclaim/nodeclaim.go [246-256]
Suggestion importance[1-10]: 7__ Why: The suggestion adds error handling after node creation, which is a good practice to ensure that any errors during creation are properly handled and returned. | Medium |
Verify terminal error handlingEnsure the test case correctly simulates the scenario where the SKU is unavailable pkg/workspace/controllers/workspace_controller_test.go [325-338]
Suggestion importance[1-10]: 7__ Why: The suggestion ensures that the test case correctly simulates the scenario where the SKU is unavailable and verifies that the terminal error is returned, which is important for testing the new behavior. | Medium | |
| General |
Update test case for new behaviorUpdate the test case to reflect the new behavior and ensure it tests the correct pkg/utils/nodeclaim/nodeclaim_test.go [24-34]
Suggestion importance[1-10]: 2__ Why: The suggestion does not actually change the test case; it merely repeats the existing code. Therefore, it does not provide any new value or improvement. | Low |
Suggestions up to commit 0dd28d5
| Category | Suggestion | Impact |
| Possible issue |
Check for unavailable instance types post creationHandle the case where the node claim creation fails due to unavailable instance pkg/utils/nodeclaim/nodeclaim.go [246-255]
Suggestion importance[1-10]: 8__ Why: The suggestion addresses a critical issue by handling the case where node claim creation fails due to unavailable instance types, which was previously ignored in the PR. This ensures that the reconcile process stops when the specific error occurs, preventing unnecessary retries and improving the robustness of the code. | Medium |
Suggestions up to commit 14a680e
| Category | Suggestion | Impact |
| Possible issue |
Check SKU availability post-creationHandle the case where the SKU is unavailable after creating the NodeClaim. pkg/utils/nodeclaim/nodeclaim.go [247-253]
Suggestion importance[1-10]: 8__ Why: The suggestion correctly addresses a critical issue by checking SKU availability after creating the NodeClaim, which was previously handled in a retry loop. This ensures that the reconciliation process stops if the SKU is unavailable, preventing unnecessary retries and improving the robustness of the code. | Medium |
PR Code Suggestions ✨
No code suggestions found for the PR.
Nice clean up. Fix UT please.
There is another finding. I just noticed that we used CheckNodeClaimStatus in both WaitForPendingNodeClaims and CreateNodeClaim.
IMO since we have CheckNodeClaimStatus in WaitForPendingNodeClaims, there is no need to CheckNodeClaimStatus in CreateNodeClaim. Should we deleted the CheckNodeClaimStatus in CreateNodeClaim? cc @zhuangqh @Fei-Guo
There is another finding. I just noticed that we used
CheckNodeClaimStatusin bothWaitForPendingNodeClaimsandCreateNodeClaim. IMO since we haveCheckNodeClaimStatusinWaitForPendingNodeClaims, there is no need toCheckNodeClaimStatusinCreateNodeClaim. Should we deleted theCheckNodeClaimStatusinCreateNodeClaim? cc @zhuangqh @Fei-Guo
In CreateNodeClaim, it's CheckNodeClass not CheckNodeClaimStatus ?
It's CheckNodeClaimStatus
Oh I see. you mean WorkspaceReconciler.CreateNodeClaim not this CreateNodeClaim.
If I understand correctly, CheckNodeClaimStatus in CreateNodeClaim is the first time you create nodeclaim and pending waiting for it. WaitForPendingNodeClaims is something when this obj requeue or controller restarts.
If you remove that code, do you want to let it fail immediately at https://github.com/kaito-project/kaito/blob/main/pkg/workspace/controllers/workspace_controller.go#L487 and requeue for pending waiting?
WaitForPendingNodeClaims is mainly used to protect a case where controller is restarted for various reasons, When the reconciliation loop restarts, we want it to wait for the nodeclaim created by previous reconciler loop to be ready before continuing.
No need to fail. Just return nil after creating nodeclaim. Since we watched nodeclaim, if it status changed another reconciliation will be triggered.
Actually, check status after creating can only be performed one time, because if no nodeclaim need to be created it won't be performed, but check status in WaitForPendingNodeClaims can always be performed.
No need to fail. Just return nil after creating nodeclaim. Since we watched nodeclaim, if it status changed another reconciliation will be triggered.
Yes. This way works. I agree with you.
Let us keep all sync loop for now.
Yes, I'm not planning to make the changes in this PR. Just raised for discussion. We need to first enhance the e2e testing capabilities before making major changes or refactoring.
Yes, I'm not planning to make the changes in this PR. Just raised for discussion. We need to first enhance the e2e testing capabilities before making major changes or refactoring.
@cvvz It's a good idea to improve E2E tests at first before improving sync loops.
@Fei-Guo Can you take another look?
@Fei-Guo @zhuangqh This pull request has removed a nodeclaim status check, and make code more readable. maybe we can include it in the 0.4.6 release.