fix(clusterapi): HasInstance with namespace prefix
What type of PR is this?
/kind bug
What this PR does / why we need it:
fixes lookup of Machines in MachineInformer store for the HasInstance case
Which issue(s) this PR fixes:
Fixes #6774
Special notes for your reviewer:
It would be good to verify this with other kind of clusters and/or flags (I'm testing on a CAPZ cluster) to make sure the lookup works in all cases.
Does this PR introduce a user-facing change?
NONE
Hi @mweibel. Thanks for your PR.
I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.
Once the patch is verified, the new status will be reflected by the ok-to-test label.
I understand the commands that are listed here.
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
ping @elmiko @MaxFedotov for authoring/reviewing the original PR #6708
@mweibel Thanks, that's my bad - I was looking as an example at findMachineByProviderID function:
https://github.com/kubernetes/autoscaler/blob/4f1c8e69a8a4031a531596c26718a262d4b6b716/cluster-autoscaler/cloudprovider/clusterapi/clusterapi_controller.go#L335-L336
Although there is a test for this function, which passed without errors, there are two problems in it (which I missed completely):
-
In
TestControllerFindMachineByProviderIDthe part where machine is found usingMachineAnnotationonNodeis skipped: https://github.com/kubernetes/autoscaler/blob/4f1c8e69a8a4031a531596c26718a262d4b6b716/cluster-autoscaler/cloudprovider/clusterapi/clusterapi_controller_test.go#L742-L750 -
When test data is generated,
MachineAnnotationonNodeis constructed usingnamespace/machine.Nameformat: https://github.com/kubernetes/autoscaler/blob/4f1c8e69a8a4031a531596c26718a262d4b6b716/cluster-autoscaler/cloudprovider/clusterapi/clusterapi_controller_test.go#L461
While in cluster-api it is just machine.Name:
https://github.com/kubernetes-sigs/cluster-api/blob/b29e26c0210d12fffb1c1d59e5bbcd492242801e/internal/controllers/machine/machine_controller_noderef.go#L109
Also, ClusterNamespaceAnnotation is not set at all.
So your logic in this case is correct, thanks for finding this out!
But to prevent such problems in the future, we need to update tests as well.
@elmiko WDYT? I can make a separate issue and update all tests.
@MaxFedotov thanks!
I tried updating the tests by doing the following change:
diff --git a/cluster-autoscaler/cloudprovider/clusterapi/clusterapi_controller_test.go b/cluster-autoscaler/cloudprovider/clusterapi/clusterapi_controller_test.go
index fc89fdae5..e374ea447 100644
--- a/cluster-autoscaler/cloudprovider/clusterapi/clusterapi_controller_test.go
+++ b/cluster-autoscaler/cloudprovider/clusterapi/clusterapi_controller_test.go
@@ -458,7 +458,8 @@ func makeLinkedNodeAndMachine(i int, namespace, clusterName string, owner metav1
ObjectMeta: metav1.ObjectMeta{
Name: fmt.Sprintf("%s-%s-node-%d", namespace, owner.Name, i),
Annotations: map[string]string{
- machineAnnotationKey: fmt.Sprintf("%s/%s-%s-machine-%d", namespace, namespace, owner.Name, i),
+ machineAnnotationKey: fmt.Sprintf("%s-%s-machine-%d", namespace, owner.Name, i),
+ clusterNamespaceAnnotationKey: namespace,
},
},
Spec: corev1.NodeSpec{
As a result TestControllerFindMachineFromNodeAnnotation on line 1252-1257 fails because it can't find the machine because the machineAnnotationKey value doesn't match with what is in Node.Spec.ProviderID. I'm unsure what to change there currently, because Azure is a special case already (normalizedProviderString treats VMSS as a special case) and I'm not sure what other providers do.
Might keep digging a bit more but in case you have an idea or want to do a follow-up, I'm happy too :)
/test pull-cluster-autoscaler-e2e-azure
@mweibel I'm maintainining some Azure-infra cluster-autoscaler tests, one scenario of which includes the clusterapi provider (running in Azure via CAPZ), so this test run will get some basic signal against your change
@MaxFedotov i definitely think it would be cool to make the tests more accurate. although, i don't want to break other stuff.
@mweibel thanks for the PR, i'm still understanding the nuances that you and Max are talking about. but i will spend some time today/tomorrow reviewing this.
btw it would be great if somebody could give this an /ok-to-test. Thanks!
/ok-to-test
apologies @mweibel i lost track of this PR a little, wonder if there were any further thoughts from @jackfrancis about the last comment?
bumping this hoping to get a reply from @jackfrancis regarding what @elmiko asked :)
code lgtm
What's the latest on the viability of updating the tests as per the above conversation between @mweibel and @MaxFedotov?
@jackfrancis thanks for the reminder!
I had another look this morning on the tests and actually found a missing update to a .findMachine() call thanks to updating the tests accordingly to what @MaxFedotov mentioned! Nice catch there :+1:
Would be great to get another review.
Given that I overlooked one call for findMachine I wonder if we should adjust the functions signature to accept a namespace/name or something similar to avoid potential issues in the future.
@mweibel i'll take a look this week, thanks for the update!
it seems like there are no further requests or objections, so i am going to approve this. thanks everyone!
/approve
[APPROVALNOTIFIER] This PR is APPROVED
This pull-request has been approved by: elmiko, mweibel
The full list of commands accepted by this bot can be found here.
The pull request process is described here
- ~~cluster-autoscaler/cloudprovider/clusterapi/OWNERS~~ [elmiko]
Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment
@elmiko thanks for merging! can you take care of including this in the next patch release, too? 🙇
@mweibel yes,i will try to make sure that it's in the next patch release.