kubernetes-on-aws
kubernetes-on-aws copied to clipboard
Update to Kubernetes 1.22
Kubernetes 1.22: https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.22.md
- Remove support to Ingress versions
extensions/v1beta1andkuberentes.k8s.io/v1beta1
Resolved the merge conflicts and rebased on latest dev.
The AMI seems to be missing for k8s 1.22.13
Fail to provision: unable to read configuration defaults: template: main/cluster/config-defaults.yaml:554:32: executing \"main/cluster/config-defaults.yaml\" at <amiID \"zalando-ubuntu-kubernetes-production-v1.22.13-arm64-master-237\" \"861068367966\">: error calling amiID: no image found with name: zalando-ubuntu-kubernetes-production-v1.22.13-arm64-master-237 and owner: 861068367966
Create cluster failing due to a master node not coming up. Upon further investigation, the kubelet service on the unhealthy node was failing due to BoundServiceAccountToken issue.
"Failed to set feature gates from initial flags-based config" err="cannot set feature gate BoundServiceAccountTokenVolume to false, feature is locked to true"
Will try to remove this feature flag to check cluster build.
Set rotate_service_account_tokens config-item as true to test the PR now.
The setHostnameAsFqdn is stable in v1.22. Enabling this as well and updating link to documentation for PR to succeed.
Reference: Error in a master node kubelet service:
"Failed to set feature gates from initial flags-based config" err="cannot set feature gate SetHostnameAsFQDN to false, feature is locked to true"
Master node comes up but stuck in NotReady. The api-server logs on the node show:
E0906 15:54:22.703888 1 available_controller.go:524] v1beta1.metrics.k8s.io failed with: failing or missing response from https://10.5.187.205:443/apis/metrics.k8s.io/v1beta1: Get "https://10.5.187.205:443/apis/metrics.k8s.io/v1beta1": context deadline exceeded
E0906 15:54:24.150030 1 controller.go:116] loading OpenAPI spec for "v1beta1.external.metrics.k8s.io" failed with: failed to retrieve openAPI spec, http error: ResponseCode: 503, Body: error trying to reach service: dial tcp 10.5.162.112:443: i/o timeout
The kubelet logs show failure of the network plugin:
"Error syncing pod, skipping" err="failed to \"StartContainer\" for \"ensure-apiserver\" with CrashLoopBackOff: \"back-off 5m0s restarting>
"Error syncing pod, skipping" err="failed to \"StartContainer\" for \"kube-proxy\" with CrashLoopBackOff: \"back-off 5m0s restarting faile>
"Error syncing pod, skipping" err="network is not ready: container runtime network not ready: NetworkReady=false reason:NetworkPluginNotRe>
"Error syncing pod, skipping" err="network is not ready: container runtime network not ready: NetworkReady=false reason:NetworkPluginNotRe>
"Unable to update cni config" err="no networks found in /etc/kubernetes/cni/net.d"
The kube-controller-manager showing errors because of a deprecated flag
Error: unknown flag: --horizontal-pod-autoscaler-use-rest-clients
The kube-proxy was failing due to erroneously using the older version of the flag in its manifest that had BoundServiceAccountVolume set to false.
Removed the flag and trying again.
The kube-controller-manager not able to get the authz tokens
W0907 14:49:15.652914 1 authorization.go:225] failed to read in-cluster kubeconfig for delegated authorization: open /var/run/secrets/kubernetes.io/serviceaccount/token: no such file or directory
kube-proxy crashing with the error:
ubuntu@ip-172-31-21-150:~$ docker logs 998c5d744ee0
I0913 12:15:18.059312 1 flags.go:59] FLAG: --add-dir-header="false"
I0913 12:15:18.059403 1 flags.go:59] FLAG: --alsologtostderr="false"
I0913 12:15:18.059410 1 flags.go:59] FLAG: --bind-address="0.0.0.0"
I0913 12:15:18.059417 1 flags.go:59] FLAG: --bind-address-hard-fail="false"
I0913 12:15:18.059424 1 flags.go:59] FLAG: --boot-id-file="/proc/sys/kernel/random/boot_id"
I0913 12:15:18.059436 1 flags.go:59] FLAG: --cleanup="false"
I0913 12:15:18.059441 1 flags.go:59] FLAG: --cluster-cidr=""
I0913 12:15:18.059450 1 flags.go:59] FLAG: --config="/config/kube-proxy.yaml"
I0913 12:15:18.059456 1 flags.go:59] FLAG: --config-sync-period="15m0s"
I0913 12:15:18.059462 1 flags.go:59] FLAG: --conntrack-max-per-core="32768"
I0913 12:15:18.059469 1 flags.go:59] FLAG: --conntrack-min="131072"
I0913 12:15:18.059474 1 flags.go:59] FLAG: --conntrack-tcp-timeout-close-wait="1h0m0s"
I0913 12:15:18.059479 1 flags.go:59] FLAG: --conntrack-tcp-timeout-established="24h0m0s"
I0913 12:15:18.059485 1 flags.go:59] FLAG: --detect-local-mode=""
I0913 12:15:18.059490 1 flags.go:59] FLAG: --feature-gates=""
I0913 12:15:18.059498 1 flags.go:59] FLAG: --healthz-bind-address="0.0.0.0:10256"
I0913 12:15:18.059504 1 flags.go:59] FLAG: --healthz-port="10256"
I0913 12:15:18.059509 1 flags.go:59] FLAG: --help="false"
I0913 12:15:18.059514 1 flags.go:59] FLAG: --hostname-override="ip-172-31-21-150.eu-central-1.compute.internal"
I0913 12:15:18.059523 1 flags.go:59] FLAG: --iptables-masquerade-bit="14"
I0913 12:15:18.059546 1 flags.go:59] FLAG: --iptables-min-sync-period="1s"
I0913 12:15:18.059551 1 flags.go:59] FLAG: --iptables-sync-period="30s"
I0913 12:15:18.059557 1 flags.go:59] FLAG: --ipvs-exclude-cidrs="[]"
I0913 12:15:18.059574 1 flags.go:59] FLAG: --ipvs-min-sync-period="0s"
I0913 12:15:18.059579 1 flags.go:59] FLAG: --ipvs-scheduler=""
I0913 12:15:18.059584 1 flags.go:59] FLAG: --ipvs-strict-arp="false"
I0913 12:15:18.059589 1 flags.go:59] FLAG: --ipvs-sync-period="30s"
I0913 12:15:18.059594 1 flags.go:59] FLAG: --ipvs-tcp-timeout="0s"
I0913 12:15:18.059598 1 flags.go:59] FLAG: --ipvs-tcpfin-timeout="0s"
I0913 12:15:18.059603 1 flags.go:59] FLAG: --ipvs-udp-timeout="0s"
I0913 12:15:18.059608 1 flags.go:59] FLAG: --kube-api-burst="10"
I0913 12:15:18.059613 1 flags.go:59] FLAG: --kube-api-content-type="application/vnd.kubernetes.protobuf"
I0913 12:15:18.059624 1 flags.go:59] FLAG: --kube-api-qps="5"
I0913 12:15:18.059631 1 flags.go:59] FLAG: --kubeconfig=""
I0913 12:15:18.059636 1 flags.go:59] FLAG: --log-backtrace-at=":0"
I0913 12:15:18.059665 1 flags.go:59] FLAG: --log-dir=""
I0913 12:15:18.059671 1 flags.go:59] FLAG: --log-file=""
I0913 12:15:18.059676 1 flags.go:59] FLAG: --log-file-max-size="1800"
I0913 12:15:18.059683 1 flags.go:59] FLAG: --log-flush-frequency="5s"
I0913 12:15:18.059688 1 flags.go:59] FLAG: --logtostderr="true"
I0913 12:15:18.059693 1 flags.go:59] FLAG: --machine-id-file="/etc/machine-id,/var/lib/dbus/machine-id"
I0913 12:15:18.059699 1 flags.go:59] FLAG: --masquerade-all="false"
I0913 12:15:18.059704 1 flags.go:59] FLAG: --master=""
I0913 12:15:18.059709 1 flags.go:59] FLAG: --metrics-bind-address="127.0.0.1:10249"
I0913 12:15:18.059714 1 flags.go:59] FLAG: --metrics-port="10249"
I0913 12:15:18.059719 1 flags.go:59] FLAG: --nodeport-addresses="[]"
I0913 12:15:18.059730 1 flags.go:59] FLAG: --one-output="false"
I0913 12:15:18.059735 1 flags.go:59] FLAG: --oom-score-adj="-999"
I0913 12:15:18.059740 1 flags.go:59] FLAG: --profiling="false"
I0913 12:15:18.059744 1 flags.go:59] FLAG: --proxy-mode=""
I0913 12:15:18.059755 1 flags.go:59] FLAG: --proxy-port-range=""
I0913 12:15:18.059761 1 flags.go:59] FLAG: --show-hidden-metrics-for-version=""
I0913 12:15:18.059765 1 flags.go:59] FLAG: --skip-headers="false"
I0913 12:15:18.059770 1 flags.go:59] FLAG: --skip-log-headers="false"
I0913 12:15:18.059775 1 flags.go:59] FLAG: --stderrthreshold="2"
I0913 12:15:18.059779 1 flags.go:59] FLAG: --udp-timeout="250ms"
I0913 12:15:18.059784 1 flags.go:59] FLAG: --v="2"
I0913 12:15:18.059789 1 flags.go:59] FLAG: --version="false"
I0913 12:15:18.059814 1 flags.go:59] FLAG: --vmodule=""
I0913 12:15:18.059820 1 flags.go:59] FLAG: --write-config-to=""
W0913 12:15:18.062662 1 server.go:435] using lenient decoding as strict decoding failed: strict decoder error for apiVersion: kubeproxy.config.k8s.io/v1alpha1
bindAddress: 0.0.0.0
clientConnection:
acceptContentTypes: ""
burst: 10
contentType: application/vnd.kubernetes.protobuf
qps: 5
clusterCIDR: ""
configSyncPeriod: 15m0s
conntrack:
maxPerCore: 131072
min: 524288
tcpCloseWaitTimeout: 1h0m0s
tcpEstablishedTimeout: 24h0m0s
enableProfiling: false
featureGates:
BoundServiceAccountTokenVolume: false
EndpointSliceProxying: true
SizeMemoryBackedVolumes: true
healthzBindAddress: 127.0.0.1:10256
hostnameOverride: ""
iptables:
masqueradeAll: false
masqueradeBit: 14
minSyncPeriod: 0s
syncPeriod: 30s
ipvs:
minSyncPeriod: 0s
scheduler: ""
syncPeriod: 30s
kind: KubeProxyConfiguration
metricsBindAddress: 127.0.0.1:10249
mode: iptables
oomScoreAdj: -999
portRange: ""
resourceContainer: /kube-proxy
udpIdleTimeout: 250ms
: v1alpha1.KubeProxyConfiguration.UDPIdleTimeout: ReadObject: found unknown field: resourceContainer, error found in #10 byte of ...|Container":"/kube-pr|..., bigger context ...|mScoreAdj":-999,"portRange":"","resourceContainer":"/kube-proxy","udpIdleTimeout":"250ms"}|...
F0913 12:15:18.067233 1 server.go:486] failed complete: cannot set feature gate BoundServiceAccountTokenVolume to false, feature is locked to true
goroutine 1 [running]:
k8s.io/kubernetes/vendor/k8s.io/klog/v2.stacks(0xc0000bc001, 0xc00002c870, 0x99, 0xeb)
/workspace/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/k8s.io/klog/v2/klog.go:1026 +0xb9
k8s.io/kubernetes/vendor/k8s.io/klog/v2.(*loggingT).output(0x2d7f840, 0xc000000003, 0x0, 0x0, 0xc0005beee0, 0x0, 0x252f7bd, 0x9, 0x1e6, 0x0)
/workspace/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/k8s.io/klog/v2/klog.go:975 +0x1e5
k8s.io/kubernetes/vendor/k8s.io/klog/v2.(*loggingT).printf(0x2d7f840, 0xc000000003, 0x0, 0x0, 0x0, 0x0, 0x1d7d64b, 0x13, 0xc0005c0080, 0x1, ...)
/workspace/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/k8s.io/klog/v2/klog.go:753 +0x19a
k8s.io/kubernetes/vendor/k8s.io/klog/v2.Fatalf(...)
/workspace/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/k8s.io/klog/v2/klog.go:1514
k8s.io/kubernetes/cmd/kube-proxy/app.NewProxyCommand.func1(0xc00055f900, 0xc00031f530, 0x0, 0x3)
/workspace/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/cmd/kube-proxy/app/server.go:486 +0x135
k8s.io/kubernetes/vendor/github.com/spf13/cobra.(*Command).execute(0xc00055f900, 0xc0000c0050, 0x3, 0x3, 0xc00055f900, 0xc0000c0050)
/workspace/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/github.com/spf13/cobra/command.go:856 +0x2c2
k8s.io/kubernetes/vendor/github.com/spf13/cobra.(*Command).ExecuteC(0xc00055f900, 0xc000088180, 0x2d7f400, 0x0)
/workspace/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/github.com/spf13/cobra/command.go:960 +0x375
k8s.io/kubernetes/vendor/github.com/spf13/cobra.(*Command).Execute(...)
/workspace/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/github.com/spf13/cobra/command.go:897
main.main()
_output/dockerized/go/src/k8s.io/kubernetes/cmd/kube-proxy/proxy.go:48 +0x109
goroutine 18 [chan receive]:
k8s.io/kubernetes/vendor/k8s.io/klog/v2.(*loggingT).flushDaemon(0x2d7f840)
/workspace/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/k8s.io/klog/v2/klog.go:1169 +0x8b
created by k8s.io/kubernetes/vendor/k8s.io/klog/v2.init.0
/workspace/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/k8s.io/klog/v2/klog.go:420 +0xdf
goroutine 32 [select]:
k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/wait.BackoffUntil(0x1e3d6b0, 0x1fb4de0, 0xc00033a270, 0x1, 0xc00008c360)
/workspace/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:167 +0x118
k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/wait.JitterUntil(0x1e3d6b0, 0x12a05f200, 0x0, 0x1, 0xc00008c360)
/workspace/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:133 +0x98
k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/wait.Until(...)
/workspace/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:90
k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/wait.Forever(0x1e3d6b0, 0x12a05f200)
/workspace/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:81 +0x4f
created by k8s.io/kubernetes/vendor/k8s.io/component-base/logs.InitLogs
/workspace/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/k8s.io/component-base/logs/logs.go:58 +0x8a
goroutine 33 [runnable]:
k8s.io/kubernetes/vendor/github.com/fsnotify/fsnotify.(*Watcher).readEvents(0xc00008a5a0)
/workspace/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/github.com/fsnotify/fsnotify/inotify.go:172
created by k8s.io/kubernetes/vendor/github.com/fsnotify/fsnotify.NewWatcher
/workspace/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/github.com/fsnotify/fsnotify/inotify.go:59 +0x1ab
Manually edited the kube-proxy-config ConfigMap to set the feature flag BoundServiceAccountTokenVolume: true and kube-proxy is working now.
controller-manager is crashing with the error:
ubuntu@ip-172-31-21-150:~$ docker logs fcea13fb45e7
W0913 12:28:25.537130 1 feature_gate.go:237] Setting GA feature gate SetHostnameAsFQDN=true. It will be removed in a future release.
W0913 12:28:25.537211 1 feature_gate.go:237] Setting GA feature gate BoundServiceAccountTokenVolume=true. It will be removed in a future release.
I0913 12:28:26.051432 1 serving.go:347] Generated self-signed cert in-memory
W0913 12:28:26.764183 1 authentication.go:419] failed to read in-cluster kubeconfig for delegated authentication: open /var/run/secrets/kubernetes.io/serviceaccount/token: no such file or directory
W0913 12:28:26.764237 1 authentication.go:316] No authentication-kubeconfig provided in order to lookup client-ca-file in configmap/extension-apiserver-authentication in kube-system, so client certificate authentication won't work.
W0913 12:28:26.764248 1 authentication.go:340] No authentication-kubeconfig provided in order to lookup requestheader-client-ca-file in configmap/extension-apiserver-authentication in kube-system, so request-header client certificate authentication won't work.
W0913 12:28:26.764266 1 authorization.go:225] failed to read in-cluster kubeconfig for delegated authorization: open /var/run/secrets/kubernetes.io/serviceaccount/token: no such file or directory
W0913 12:28:26.764286 1 authorization.go:193] No authorization-kubeconfig provided, so SubjectAccessReview of authorization tokens won't work.
I0913 12:28:26.764308 1 controllermanager.go:186] Version: v1.22.13-zalando-master-82-dirty
I0913 12:28:26.765326 1 secure_serving.go:200] Serving securely on [::]:10257
I0913 12:28:26.765547 1 tlsconfig.go:240] "Starting DynamicServingCertificateController"
I0913 12:28:26.766288 1 leaderelection.go:248] attempting to acquire leader lease kube-system/kube-controller-manager...
Update: It appears the liveness probe is failing on the controller manager connection refused.
Updated kube-controller-manager liveness probe settings with port 10257 and scheme https.
Discovered that the kube-proxy update is not applied until after the control plane provisioning for v1.22 is finished. This causes a chicken and egg problem since this update is required in order to rollout v1.22 control plane. Created a PR (above) to remove the flag from kube-proxy-config in order to ensure v1.22 control plane provisioning succeeds, by ensuring the flag is not there before an update is rolled out.
Observed node rollout to be successful. Following kube-system components stuck in CrashLoopBackOff:
external-dns, with error:
nmalik@ZALANDO-70804 ~ % kubectl logs external-dns-68c8f9dd76-lwlf9 -nkube-system
time="2022-09-14T17:12:31Z" level=info msg="config: {APIServerURL: KubeConfig: RequestTimeout:30s DefaultTargets:[] ContourLoadBalancerService:heptio-contour/contour GlooNamespace:gloo-system SkipperRouteGroupVersion:zalando.org/v1 Sources:[service ingress skipper-routegroup] Namespace: AnnotationFilter:external-dns.alpha.kubernetes.io/exclude notin (true) LabelFilter: FQDNTemplate: CombineFQDNAndAnnotation:false IgnoreHostnameAnnotation:false IgnoreIngressTLSSpec:false IgnoreIngressRulesSpec:false Compatibility: PublishInternal:false PublishHostIP:false AlwaysPublishNotReadyAddresses:false ConnectorSourceServer:localhost:8080 Provider:aws GoogleProject: GoogleBatchChangeSize:1000 GoogleBatchChangeInterval:1s GoogleZoneVisibility: DomainFilter:[] ExcludeDomains:[cluster.local] RegexDomainFilter: RegexDomainExclusion: ZoneNameFilter:[] ZoneIDFilter:[] AlibabaCloudConfigFile:/etc/kubernetes/alibaba-cloud.json AlibabaCloudZoneType: AWSZoneType: AWSZoneTagFilter:[] AWSAssumeRole: AWSBatchChangeSize:100 AWSBatchChangeInterval:1s AWSEvaluateTargetHealth:true AWSAPIRetries:3 AWSPreferCNAME:false AWSZoneCacheDuration:0s AzureConfigFile:/etc/kubernetes/azure.json AzureResourceGroup: AzureSubscriptionID: AzureUserAssignedIdentityClientID: BluecatConfigFile:/etc/kubernetes/bluecat.json CloudflareProxied:false CloudflareZonesPerPage:50 CoreDNSPrefix:/skydns/ RcodezeroTXTEncrypt:false AkamaiServiceConsumerDomain: AkamaiClientToken: AkamaiClientSecret: AkamaiAccessToken: AkamaiEdgercPath: AkamaiEdgercSection: InfobloxGridHost: InfobloxWapiPort:443 InfobloxWapiUsername:admin InfobloxWapiPassword: InfobloxWapiVersion:2.3.1 InfobloxSSLVerify:true InfobloxView: InfobloxMaxResults:0 InfobloxFQDNRegEx: DynCustomerName: DynUsername: DynPassword: DynMinTTLSeconds:0 OCIConfigFile:/etc/kubernetes/oci.yaml InMemoryZones:[] OVHEndpoint:ovh-eu OVHApiRateLimit:20 PDNSServer:http://localhost:8081 PDNSAPIKey: PDNSTLSEnabled:false TLSCA: TLSClientCert: TLSClientCertKey: Policy:sync Registry:txt TXTOwnerID:eu-central-1:e2e-pr-5269-37 TXTPrefix: TXTSuffix: Interval:1m0s MinEventSyncInterval:5s Once:false DryRun:false UpdateEvents:false LogFormat:text MetricsAddress::7979 LogLevel:info TXTCacheInterval:0s TXTWildcardReplacement: ExoscaleEndpoint:https://api.exoscale.ch/dns ExoscaleAPIKey: ExoscaleAPISecret: CRDSourceAPIVersion:externaldns.k8s.io/v1alpha1 CRDSourceKind:DNSEndpoint ServiceTypeFilter:[] CFAPIEndpoint: CFUsername: CFPassword: RFC2136Host: RFC2136Port:0 RFC2136Zone: RFC2136Insecure:false RFC2136GSSTSIG:false RFC2136KerberosRealm: RFC2136KerberosUsername: RFC2136KerberosPassword: RFC2136TSIGKeyName: RFC2136TSIGSecret: RFC2136TSIGSecretAlg: RFC2136TAXFR:false RFC2136MinTTL:0s RFC2136BatchChangeSize:50 NS1Endpoint: NS1IgnoreSSL:false NS1MinTTLSeconds:0 TransIPAccountName: TransIPPrivateKeyFile: DigitalOceanAPIPageSize:50 ManagedDNSRecordTypes:[A CNAME] GoDaddyAPIKey: GoDaddySecretKey: GoDaddyTTL:0 GoDaddyOTE:false}"
time="2022-09-14T17:12:31Z" level=info msg="Instantiating new Kubernetes client"
time="2022-09-14T17:12:31Z" level=info msg="Using inCluster-config based on serviceaccount-token"
time="2022-09-14T17:12:31Z" level=info msg="Created Kubernetes client https://10.5.0.1:443"
time="2022-09-14T17:13:32Z" level=fatal msg="failed to sync cache: timed out waiting for the condition"
vpa-admission-controller: container failed to create with no more information in pod logs or description.
The create cluster step succeeded anyway.
This should address the external-dns issue: https://github.com/zalando-incubator/kubernetes-on-aws/pull/5354
Rebased on dev to remove the feature flag from kube-proxy-config.
e2e testing fails due to external-dns and vertical-pod-autoscaler issues. Those are being upgraded. Work will resume on this PR once those upgrades are finished.
vpa upgrade: https://github.bus.zalan.do/teapot/issues/issues/3328 external-dns upgrade: #5354
Upgraded to v1.22.14. Also, restored the kuberuntu version for v1.21 (which got downgraded mistakenly). This new rebase also includes the vertical-pod-autoscaler upgrade so expecting to not see the vpa cause issues now :)
The following pods in CrashLoopBackoff in the e2e test:
deployment-status-service(3 pods) - gettingOOMKilled.external-dns(1 pod) - this is expected to be resolved with the upgrade.kube-janitor(1 pod) - gettingOOMKilled.
(1) seems to have healed itself, with time. Looking into kube-janitor OOM kill.
Retriggering seems to have fixed the transient issues observed with deployment-service-status-service and kube-janitor. Only the external-dns seems to be in CrashLoopBackoff, which is expected to be the case until the upgrade is complete. Putting this PR work on hold until external-dns upgrade is finished.
Rebased after external-dns upgrade to v0.12.2.
Failure in 5 upstream conformance tests:
20/09/2022 10:18:39 [91m[1mSummarizing 5 Failures:[0m
20/09/2022 10:18:39
20/09/2022 10:18:39 [91m[1m[Fail] [0m[90m[sig-api-machinery] AdmissionWebhook [Privileged:ClusterAdmin] [0m[91m[1m[It] should mutate custom resource with different stored version [Conformance] [0m
20/09/2022 10:18:39 [37m/workspace/test/e2e/e2e_modules/kubernetes/test/e2e/apimachinery/webhook.go:1965[0m
20/09/2022 10:18:39
20/09/2022 10:18:39 [91m[1m[Fail] [0m[90m[sig-api-machinery] CustomResourcePublishOpenAPI [Privileged:ClusterAdmin] [0m[91m[1m[It] works for multiple CRDs of different groups [Conformance] [0m
20/09/2022 10:18:39 [37m/workspace/test/e2e/e2e_modules/kubernetes/test/e2e/apimachinery/crd_publish_openapi.go:564[0m
20/09/2022 10:18:39
20/09/2022 10:18:39 [91m[1m[Fail] [0m[90m[sig-api-machinery] CustomResourcePublishOpenAPI [Privileged:ClusterAdmin] [0m[91m[1m[It] works for CRD with validation schema [Conformance] [0m
20/09/2022 10:18:39 [37m/go/pkg/mod/github.com/onsi/[email protected]/internal/leafnodes/runner.go:113[0m
20/09/2022 10:18:39
20/09/2022 10:18:39 [91m[1m[Fail] [0m[90m[sig-api-machinery] CustomResourcePublishOpenAPI [Privileged:ClusterAdmin] [0m[91m[1m[It] works for multiple CRDs of same group and version but different kinds [Conformance] [0m
20/09/2022 10:18:39 [37m/workspace/test/e2e/e2e_modules/kubernetes/test/e2e/apimachinery/crd_publish_openapi.go:564[0m
20/09/2022 10:18:39
20/09/2022 10:18:39 [91m[1m[Fail] [0m[90m[sig-api-machinery] CustomResourcePublishOpenAPI [Privileged:ClusterAdmin] [0m[91m[1m[It] works for multiple CRDs of different groups [Conformance] [0m
20/09/2022 10:18:39 [37m/workspace/test/e2e/e2e_modules/kubernetes/test/e2e/apimachinery/crd_publish_openapi.go:564[0m
20/09/2022 10:18:39
20/09/2022 10:18:39 [1m[91mRan 364 of 6441 Specs in 1175.241 seconds[0m
20/09/2022 10:18:39 [1m[91mFAIL![0m -- [32m[1m363 Passed[0m | [91m[1m1 Failed[0m | [33m[1m3 Flaked[0m | [33m[1m0 Pending[0m | [36m[1m6077 Skipped[0m
Taking a closer look at the failing e2e conformance tests by running them locally. https://github.com/zalando-incubator/kubernetes-on-aws/blob/dev/test/e2e/README.md#running-the-tests
Pipeline is 🟢 for the first time! 😮
Rebased the PR on latest dev.
Updated to AMI based on Kubernetes v1.22.16, also rebased on dev.
:+1:
:+1: