cloud-provider-azure
cloud-provider-azure copied to clipboard
Failed ILB during capz E2E
What happened:
While investigating a test flake I noticed an ILB error that may be interesting:
{ Failure /home/prow/go/src/sigs.k8s.io/cluster-api-provider-azure/test/e2e/azure_test.go:434
Timed out after 900.111s.
Service default/web1zqon9-ilb failed
Service:
{
"metadata": {
"name": "web1zqon9-ilb",
"namespace": "default",
"uid": "f1eaabf3-cb5d-4859-b786-4f1909e62928",
"resourceVersion": "1177",
"creationTimestamp": "2022-04-26T09:30:19Z",
"annotations": {
"service.beta.kubernetes.io/azure-load-balancer-internal": "true"
},
"finalizers": [
"service.kubernetes.io/load-balancer-cleanup"
],
"managedFields": [
{
"manager": "cloud-controller-manager",
"operation": "Update",
"apiVersion": "v1",
"time": "2022-04-26T09:30:19Z",
"fieldsType": "FieldsV1",
"fieldsV1": {
"f:metadata": {
"f:finalizers": {
".": {},
"v:\"service.kubernetes.io/load-balancer-cleanup\"": {}
}
}
},
"subresource": "status"
},
{
"manager": "cluster-api-e2e",
"operation": "Update",
"apiVersion": "v1",
"time": "2022-04-26T09:30:19Z",
"fieldsType": "FieldsV1",
"fieldsV1": {
"f:metadata": {
"f:annotations": {
".": {},
"f:service.beta.kubernetes.io/azure-load-balancer-internal": {}
}
},
"f:spec": {
"f:allocateLoadBalancerNodePorts": {},
"f:externalTrafficPolicy": {},
"f:internalTrafficPolicy": {},
"f:ports": {
".": {},
"k:{\"port\":80,\"protocol\":\"TCP\"}": {
".": {},
"f:name": {},
"f:port": {},
"f:protocol": {},
"f:targetPort": {}
},
"k:{\"port\":443,\"protocol\":\"TCP\"}": {
".": {},
"f:name": {},
"f:port": {},
"f:protocol": {},
"f:targetPort": {}
}
},
"f:selector": {},
"f:sessionAffinity": {},
"f:type": {}
}
}
}
]
},
"spec": {
"ports": [
{
"name": "http",
"protocol": "TCP",
"port": 80,
"targetPort": 80,
"nodePort": 32348
},
{
"name": "https",
"protocol": "TCP",
"port": 443,
"targetPort": 443,
"nodePort": 30461
}
],
"selector": {
"app": "web1zqon9"
},
"clusterIP": "10.98.217.212",
"clusterIPs": [
"10.98.217.212"
],
"type": "LoadBalancer",
"sessionAffinity": "None",
"externalTrafficPolicy": "Cluster",
"ipFamilies": [
"IPv4"
],
"ipFamilyPolicy": "SingleStack",
"allocateLoadBalancerNodePorts": true,
"internalTrafficPolicy": "Cluster"
},
"status": {
"loadBalancer": {}
}
}
LAST SEEN TYPE REASON OBJECT MESSAGE
2022-04-26 09:40:34 +0000 UTC Normal EnsuringLoadBalancer service/web1zqon9-ilb Ensuring load balancer
2022-04-26 09:40:34 +0000 UTC Warning SyncLoadBalancerFailed service/web1zqon9-ilb Error syncing load balancer: failed to ensure load balancer: reconcileSharedLoadBalancer: failed to list LB: ListManagedLBs: failed to get agent pool vmSet names: GetAgentPoolVMSetNames: failed to execute getAgentPoolScaleSets: not a vmss instance
Full test results is here:
https://prow.k8s.io/view/gs/kubernetes-jenkins/logs/periodic-cluster-api-provider-azure-e2e-full-main/1518865402938003456
What you expected to happen:
Normally the ILB test passes fine
How to reproduce it (as minimally and precisely as possible):
Anything else we need to know?:
Environment:
- Kubernetes version (use
kubectl version
): v1.22.9 running OOT v1.1.13 - Cloud provider or hardware configuration: Azure
- OS (e.g:
cat /etc/os-release
): - Kernel (e.g.
uname -a
): - Install tools:
- Network plugin and version (if this is a network-related bug):
- Others:
Basically the interesting thing to me is the VMSS failure immediately after the LB is created:
2022-04-26 09:40:34 +0000 UTC Normal EnsuringLoadBalancer service/web1zqon9-ilb Ensuring load balancer
2022-04-26 09:40:34 +0000 UTC Warning SyncLoadBalancerFailed service/web1zqon9-ilb Error syncing load balancer: failed to ensure load balancer: reconcileSharedLoadBalancer: failed to list LB: ListManagedLBs: failed to get agent pool vmSet names: GetAgentPoolVMSetNames: failed to execute getAgentPoolScaleSets: not a vmss instance
What might that indicate?
cc @feiskyer @CecileRobertMichon
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied - After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied - After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closed
You can:
- Mark this issue or PR as fresh with
/remove-lifecycle stale
- Mark this issue or PR as rotten with
/lifecycle rotten
- Close this issue or PR with
/close
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
/remove-lifecycle stale
We didn't see any similar issues recently. Please update with more details if you still have such issues and reopen.
/close
@feiskyer: Closing this issue.
In response to this:
We didn't see any similar issues recently. Please update with more details if you still have such issues and reopen.
/close
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.