kubernetes-on-aws icon indicating copy to clipboard operation
kubernetes-on-aws copied to clipboard

Update to Kubernetes 1.22

Open katyanna opened this issue 3 years ago • 24 comments

Kubernetes 1.22: https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.22.md

  • Remove support to Ingress versions extensions/v1beta1 and kuberentes.k8s.io/v1beta1

katyanna avatar Jul 25 '22 08:07 katyanna

Resolved the merge conflicts and rebased on latest dev.

demonCoder95 avatar Sep 02 '22 16:09 demonCoder95

The AMI seems to be missing for k8s 1.22.13

Fail to provision: unable to read configuration defaults: template: main/cluster/config-defaults.yaml:554:32: executing \"main/cluster/config-defaults.yaml\" at <amiID \"zalando-ubuntu-kubernetes-production-v1.22.13-arm64-master-237\" \"861068367966\">: error calling amiID: no image found with name: zalando-ubuntu-kubernetes-production-v1.22.13-arm64-master-237 and owner: 861068367966

demonCoder95 avatar Sep 02 '22 17:09 demonCoder95

Create cluster failing due to a master node not coming up. Upon further investigation, the kubelet service on the unhealthy node was failing due to BoundServiceAccountToken issue.

"Failed to set feature gates from initial flags-based config" err="cannot set feature gate BoundServiceAccountTokenVolume to false, feature is locked to true"

Will try to remove this feature flag to check cluster build.

demonCoder95 avatar Sep 05 '22 14:09 demonCoder95

Set rotate_service_account_tokens config-item as true to test the PR now.

demonCoder95 avatar Sep 06 '22 13:09 demonCoder95

The setHostnameAsFqdn is stable in v1.22. Enabling this as well and updating link to documentation for PR to succeed.

Reference: Error in a master node kubelet service:

"Failed to set feature gates from initial flags-based config" err="cannot set feature gate SetHostnameAsFQDN to false, feature is locked to true"

demonCoder95 avatar Sep 06 '22 15:09 demonCoder95

Master node comes up but stuck in NotReady. The api-server logs on the node show:

E0906 15:54:22.703888       1 available_controller.go:524] v1beta1.metrics.k8s.io failed with: failing or missing response from https://10.5.187.205:443/apis/metrics.k8s.io/v1beta1: Get "https://10.5.187.205:443/apis/metrics.k8s.io/v1beta1": context deadline exceeded
E0906 15:54:24.150030       1 controller.go:116] loading OpenAPI spec for "v1beta1.external.metrics.k8s.io" failed with: failed to retrieve openAPI spec, http error: ResponseCode: 503, Body: error trying to reach service: dial tcp 10.5.162.112:443: i/o timeout

The kubelet logs show failure of the network plugin:

"Error syncing pod, skipping" err="failed to \"StartContainer\" for \"ensure-apiserver\" with CrashLoopBackOff: \"back-off 5m0s restarting>
"Error syncing pod, skipping" err="failed to \"StartContainer\" for \"kube-proxy\" with CrashLoopBackOff: \"back-off 5m0s restarting faile>
"Error syncing pod, skipping" err="network is not ready: container runtime network not ready: NetworkReady=false reason:NetworkPluginNotRe>
"Error syncing pod, skipping" err="network is not ready: container runtime network not ready: NetworkReady=false reason:NetworkPluginNotRe>
"Unable to update cni config" err="no networks found in /etc/kubernetes/cni/net.d"

demonCoder95 avatar Sep 06 '22 16:09 demonCoder95

The kube-controller-manager showing errors because of a deprecated flag

Error: unknown flag: --horizontal-pod-autoscaler-use-rest-clients

The kube-proxy was failing due to erroneously using the older version of the flag in its manifest that had BoundServiceAccountVolume set to false.

Removed the flag and trying again.

demonCoder95 avatar Sep 07 '22 14:09 demonCoder95

The kube-controller-manager not able to get the authz tokens

W0907 14:49:15.652914       1 authorization.go:225] failed to read in-cluster kubeconfig for delegated authorization: open /var/run/secrets/kubernetes.io/serviceaccount/token: no such file or directory

demonCoder95 avatar Sep 07 '22 14:09 demonCoder95

kube-proxy crashing with the error:

ubuntu@ip-172-31-21-150:~$ docker logs 998c5d744ee0
I0913 12:15:18.059312       1 flags.go:59] FLAG: --add-dir-header="false"
I0913 12:15:18.059403       1 flags.go:59] FLAG: --alsologtostderr="false"
I0913 12:15:18.059410       1 flags.go:59] FLAG: --bind-address="0.0.0.0"
I0913 12:15:18.059417       1 flags.go:59] FLAG: --bind-address-hard-fail="false"
I0913 12:15:18.059424       1 flags.go:59] FLAG: --boot-id-file="/proc/sys/kernel/random/boot_id"
I0913 12:15:18.059436       1 flags.go:59] FLAG: --cleanup="false"
I0913 12:15:18.059441       1 flags.go:59] FLAG: --cluster-cidr=""
I0913 12:15:18.059450       1 flags.go:59] FLAG: --config="/config/kube-proxy.yaml"
I0913 12:15:18.059456       1 flags.go:59] FLAG: --config-sync-period="15m0s"
I0913 12:15:18.059462       1 flags.go:59] FLAG: --conntrack-max-per-core="32768"
I0913 12:15:18.059469       1 flags.go:59] FLAG: --conntrack-min="131072"
I0913 12:15:18.059474       1 flags.go:59] FLAG: --conntrack-tcp-timeout-close-wait="1h0m0s"
I0913 12:15:18.059479       1 flags.go:59] FLAG: --conntrack-tcp-timeout-established="24h0m0s"
I0913 12:15:18.059485       1 flags.go:59] FLAG: --detect-local-mode=""
I0913 12:15:18.059490       1 flags.go:59] FLAG: --feature-gates=""
I0913 12:15:18.059498       1 flags.go:59] FLAG: --healthz-bind-address="0.0.0.0:10256"
I0913 12:15:18.059504       1 flags.go:59] FLAG: --healthz-port="10256"
I0913 12:15:18.059509       1 flags.go:59] FLAG: --help="false"
I0913 12:15:18.059514       1 flags.go:59] FLAG: --hostname-override="ip-172-31-21-150.eu-central-1.compute.internal"
I0913 12:15:18.059523       1 flags.go:59] FLAG: --iptables-masquerade-bit="14"
I0913 12:15:18.059546       1 flags.go:59] FLAG: --iptables-min-sync-period="1s"
I0913 12:15:18.059551       1 flags.go:59] FLAG: --iptables-sync-period="30s"
I0913 12:15:18.059557       1 flags.go:59] FLAG: --ipvs-exclude-cidrs="[]"
I0913 12:15:18.059574       1 flags.go:59] FLAG: --ipvs-min-sync-period="0s"
I0913 12:15:18.059579       1 flags.go:59] FLAG: --ipvs-scheduler=""
I0913 12:15:18.059584       1 flags.go:59] FLAG: --ipvs-strict-arp="false"
I0913 12:15:18.059589       1 flags.go:59] FLAG: --ipvs-sync-period="30s"
I0913 12:15:18.059594       1 flags.go:59] FLAG: --ipvs-tcp-timeout="0s"
I0913 12:15:18.059598       1 flags.go:59] FLAG: --ipvs-tcpfin-timeout="0s"
I0913 12:15:18.059603       1 flags.go:59] FLAG: --ipvs-udp-timeout="0s"
I0913 12:15:18.059608       1 flags.go:59] FLAG: --kube-api-burst="10"
I0913 12:15:18.059613       1 flags.go:59] FLAG: --kube-api-content-type="application/vnd.kubernetes.protobuf"
I0913 12:15:18.059624       1 flags.go:59] FLAG: --kube-api-qps="5"
I0913 12:15:18.059631       1 flags.go:59] FLAG: --kubeconfig=""
I0913 12:15:18.059636       1 flags.go:59] FLAG: --log-backtrace-at=":0"
I0913 12:15:18.059665       1 flags.go:59] FLAG: --log-dir=""
I0913 12:15:18.059671       1 flags.go:59] FLAG: --log-file=""
I0913 12:15:18.059676       1 flags.go:59] FLAG: --log-file-max-size="1800"
I0913 12:15:18.059683       1 flags.go:59] FLAG: --log-flush-frequency="5s"
I0913 12:15:18.059688       1 flags.go:59] FLAG: --logtostderr="true"
I0913 12:15:18.059693       1 flags.go:59] FLAG: --machine-id-file="/etc/machine-id,/var/lib/dbus/machine-id"
I0913 12:15:18.059699       1 flags.go:59] FLAG: --masquerade-all="false"
I0913 12:15:18.059704       1 flags.go:59] FLAG: --master=""
I0913 12:15:18.059709       1 flags.go:59] FLAG: --metrics-bind-address="127.0.0.1:10249"
I0913 12:15:18.059714       1 flags.go:59] FLAG: --metrics-port="10249"
I0913 12:15:18.059719       1 flags.go:59] FLAG: --nodeport-addresses="[]"
I0913 12:15:18.059730       1 flags.go:59] FLAG: --one-output="false"
I0913 12:15:18.059735       1 flags.go:59] FLAG: --oom-score-adj="-999"
I0913 12:15:18.059740       1 flags.go:59] FLAG: --profiling="false"
I0913 12:15:18.059744       1 flags.go:59] FLAG: --proxy-mode=""
I0913 12:15:18.059755       1 flags.go:59] FLAG: --proxy-port-range=""
I0913 12:15:18.059761       1 flags.go:59] FLAG: --show-hidden-metrics-for-version=""
I0913 12:15:18.059765       1 flags.go:59] FLAG: --skip-headers="false"
I0913 12:15:18.059770       1 flags.go:59] FLAG: --skip-log-headers="false"
I0913 12:15:18.059775       1 flags.go:59] FLAG: --stderrthreshold="2"
I0913 12:15:18.059779       1 flags.go:59] FLAG: --udp-timeout="250ms"
I0913 12:15:18.059784       1 flags.go:59] FLAG: --v="2"
I0913 12:15:18.059789       1 flags.go:59] FLAG: --version="false"
I0913 12:15:18.059814       1 flags.go:59] FLAG: --vmodule=""
I0913 12:15:18.059820       1 flags.go:59] FLAG: --write-config-to=""
W0913 12:15:18.062662       1 server.go:435] using lenient decoding as strict decoding failed: strict decoder error for apiVersion: kubeproxy.config.k8s.io/v1alpha1
bindAddress: 0.0.0.0
clientConnection:
  acceptContentTypes: ""
  burst: 10
  contentType: application/vnd.kubernetes.protobuf
  qps: 5
clusterCIDR: ""
configSyncPeriod: 15m0s
conntrack:
  maxPerCore: 131072
  min: 524288
  tcpCloseWaitTimeout: 1h0m0s
  tcpEstablishedTimeout: 24h0m0s
enableProfiling: false
featureGates:
  BoundServiceAccountTokenVolume: false
  EndpointSliceProxying: true
  SizeMemoryBackedVolumes: true
healthzBindAddress: 127.0.0.1:10256
hostnameOverride: ""
iptables:
  masqueradeAll: false
  masqueradeBit: 14
  minSyncPeriod: 0s
  syncPeriod: 30s
ipvs:
  minSyncPeriod: 0s
  scheduler: ""
  syncPeriod: 30s
kind: KubeProxyConfiguration
metricsBindAddress: 127.0.0.1:10249
mode: iptables
oomScoreAdj: -999
portRange: ""
resourceContainer: /kube-proxy
udpIdleTimeout: 250ms
: v1alpha1.KubeProxyConfiguration.UDPIdleTimeout: ReadObject: found unknown field: resourceContainer, error found in #10 byte of ...|Container":"/kube-pr|..., bigger context ...|mScoreAdj":-999,"portRange":"","resourceContainer":"/kube-proxy","udpIdleTimeout":"250ms"}|...
F0913 12:15:18.067233       1 server.go:486] failed complete: cannot set feature gate BoundServiceAccountTokenVolume to false, feature is locked to true
goroutine 1 [running]:
k8s.io/kubernetes/vendor/k8s.io/klog/v2.stacks(0xc0000bc001, 0xc00002c870, 0x99, 0xeb)
        /workspace/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/k8s.io/klog/v2/klog.go:1026 +0xb9
k8s.io/kubernetes/vendor/k8s.io/klog/v2.(*loggingT).output(0x2d7f840, 0xc000000003, 0x0, 0x0, 0xc0005beee0, 0x0, 0x252f7bd, 0x9, 0x1e6, 0x0)
        /workspace/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/k8s.io/klog/v2/klog.go:975 +0x1e5
k8s.io/kubernetes/vendor/k8s.io/klog/v2.(*loggingT).printf(0x2d7f840, 0xc000000003, 0x0, 0x0, 0x0, 0x0, 0x1d7d64b, 0x13, 0xc0005c0080, 0x1, ...)
        /workspace/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/k8s.io/klog/v2/klog.go:753 +0x19a
k8s.io/kubernetes/vendor/k8s.io/klog/v2.Fatalf(...)
        /workspace/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/k8s.io/klog/v2/klog.go:1514
k8s.io/kubernetes/cmd/kube-proxy/app.NewProxyCommand.func1(0xc00055f900, 0xc00031f530, 0x0, 0x3)
        /workspace/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/cmd/kube-proxy/app/server.go:486 +0x135
k8s.io/kubernetes/vendor/github.com/spf13/cobra.(*Command).execute(0xc00055f900, 0xc0000c0050, 0x3, 0x3, 0xc00055f900, 0xc0000c0050)
        /workspace/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/github.com/spf13/cobra/command.go:856 +0x2c2
k8s.io/kubernetes/vendor/github.com/spf13/cobra.(*Command).ExecuteC(0xc00055f900, 0xc000088180, 0x2d7f400, 0x0)
        /workspace/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/github.com/spf13/cobra/command.go:960 +0x375
k8s.io/kubernetes/vendor/github.com/spf13/cobra.(*Command).Execute(...)
        /workspace/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/github.com/spf13/cobra/command.go:897
main.main()
        _output/dockerized/go/src/k8s.io/kubernetes/cmd/kube-proxy/proxy.go:48 +0x109

goroutine 18 [chan receive]:
k8s.io/kubernetes/vendor/k8s.io/klog/v2.(*loggingT).flushDaemon(0x2d7f840)
        /workspace/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/k8s.io/klog/v2/klog.go:1169 +0x8b
created by k8s.io/kubernetes/vendor/k8s.io/klog/v2.init.0
        /workspace/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/k8s.io/klog/v2/klog.go:420 +0xdf

goroutine 32 [select]:
k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/wait.BackoffUntil(0x1e3d6b0, 0x1fb4de0, 0xc00033a270, 0x1, 0xc00008c360)
        /workspace/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:167 +0x118
k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/wait.JitterUntil(0x1e3d6b0, 0x12a05f200, 0x0, 0x1, 0xc00008c360)
        /workspace/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:133 +0x98
k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/wait.Until(...)
        /workspace/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:90
k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/wait.Forever(0x1e3d6b0, 0x12a05f200)
        /workspace/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:81 +0x4f
created by k8s.io/kubernetes/vendor/k8s.io/component-base/logs.InitLogs
        /workspace/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/k8s.io/component-base/logs/logs.go:58 +0x8a

goroutine 33 [runnable]:
k8s.io/kubernetes/vendor/github.com/fsnotify/fsnotify.(*Watcher).readEvents(0xc00008a5a0)
        /workspace/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/github.com/fsnotify/fsnotify/inotify.go:172
created by k8s.io/kubernetes/vendor/github.com/fsnotify/fsnotify.NewWatcher
        /workspace/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/github.com/fsnotify/fsnotify/inotify.go:59 +0x1ab

demonCoder95 avatar Sep 13 '22 12:09 demonCoder95

Manually edited the kube-proxy-config ConfigMap to set the feature flag BoundServiceAccountTokenVolume: true and kube-proxy is working now.

demonCoder95 avatar Sep 13 '22 12:09 demonCoder95

controller-manager is crashing with the error:

ubuntu@ip-172-31-21-150:~$ docker logs fcea13fb45e7
W0913 12:28:25.537130       1 feature_gate.go:237] Setting GA feature gate SetHostnameAsFQDN=true. It will be removed in a future release.
W0913 12:28:25.537211       1 feature_gate.go:237] Setting GA feature gate BoundServiceAccountTokenVolume=true. It will be removed in a future release.
I0913 12:28:26.051432       1 serving.go:347] Generated self-signed cert in-memory
W0913 12:28:26.764183       1 authentication.go:419] failed to read in-cluster kubeconfig for delegated authentication: open /var/run/secrets/kubernetes.io/serviceaccount/token: no such file or directory
W0913 12:28:26.764237       1 authentication.go:316] No authentication-kubeconfig provided in order to lookup client-ca-file in configmap/extension-apiserver-authentication in kube-system, so client certificate authentication won't work.
W0913 12:28:26.764248       1 authentication.go:340] No authentication-kubeconfig provided in order to lookup requestheader-client-ca-file in configmap/extension-apiserver-authentication in kube-system, so request-header client certificate authentication won't work.
W0913 12:28:26.764266       1 authorization.go:225] failed to read in-cluster kubeconfig for delegated authorization: open /var/run/secrets/kubernetes.io/serviceaccount/token: no such file or directory
W0913 12:28:26.764286       1 authorization.go:193] No authorization-kubeconfig provided, so SubjectAccessReview of authorization tokens won't work.
I0913 12:28:26.764308       1 controllermanager.go:186] Version: v1.22.13-zalando-master-82-dirty
I0913 12:28:26.765326       1 secure_serving.go:200] Serving securely on [::]:10257
I0913 12:28:26.765547       1 tlsconfig.go:240] "Starting DynamicServingCertificateController"
I0913 12:28:26.766288       1 leaderelection.go:248] attempting to acquire leader lease kube-system/kube-controller-manager...

Update: It appears the liveness probe is failing on the controller manager connection refused.

demonCoder95 avatar Sep 13 '22 12:09 demonCoder95

Updated kube-controller-manager liveness probe settings with port 10257 and scheme https.

demonCoder95 avatar Sep 14 '22 14:09 demonCoder95

Discovered that the kube-proxy update is not applied until after the control plane provisioning for v1.22 is finished. This causes a chicken and egg problem since this update is required in order to rollout v1.22 control plane. Created a PR (above) to remove the flag from kube-proxy-config in order to ensure v1.22 control plane provisioning succeeds, by ensuring the flag is not there before an update is rolled out.

demonCoder95 avatar Sep 14 '22 16:09 demonCoder95

Observed node rollout to be successful. Following kube-system components stuck in CrashLoopBackOff:

  1. external-dns, with error:
nmalik@ZALANDO-70804 ~ % kubectl logs external-dns-68c8f9dd76-lwlf9 -nkube-system
time="2022-09-14T17:12:31Z" level=info msg="config: {APIServerURL: KubeConfig: RequestTimeout:30s DefaultTargets:[] ContourLoadBalancerService:heptio-contour/contour GlooNamespace:gloo-system SkipperRouteGroupVersion:zalando.org/v1 Sources:[service ingress skipper-routegroup] Namespace: AnnotationFilter:external-dns.alpha.kubernetes.io/exclude notin (true) LabelFilter: FQDNTemplate: CombineFQDNAndAnnotation:false IgnoreHostnameAnnotation:false IgnoreIngressTLSSpec:false IgnoreIngressRulesSpec:false Compatibility: PublishInternal:false PublishHostIP:false AlwaysPublishNotReadyAddresses:false ConnectorSourceServer:localhost:8080 Provider:aws GoogleProject: GoogleBatchChangeSize:1000 GoogleBatchChangeInterval:1s GoogleZoneVisibility: DomainFilter:[] ExcludeDomains:[cluster.local] RegexDomainFilter: RegexDomainExclusion: ZoneNameFilter:[] ZoneIDFilter:[] AlibabaCloudConfigFile:/etc/kubernetes/alibaba-cloud.json AlibabaCloudZoneType: AWSZoneType: AWSZoneTagFilter:[] AWSAssumeRole: AWSBatchChangeSize:100 AWSBatchChangeInterval:1s AWSEvaluateTargetHealth:true AWSAPIRetries:3 AWSPreferCNAME:false AWSZoneCacheDuration:0s AzureConfigFile:/etc/kubernetes/azure.json AzureResourceGroup: AzureSubscriptionID: AzureUserAssignedIdentityClientID: BluecatConfigFile:/etc/kubernetes/bluecat.json CloudflareProxied:false CloudflareZonesPerPage:50 CoreDNSPrefix:/skydns/ RcodezeroTXTEncrypt:false AkamaiServiceConsumerDomain: AkamaiClientToken: AkamaiClientSecret: AkamaiAccessToken: AkamaiEdgercPath: AkamaiEdgercSection: InfobloxGridHost: InfobloxWapiPort:443 InfobloxWapiUsername:admin InfobloxWapiPassword: InfobloxWapiVersion:2.3.1 InfobloxSSLVerify:true InfobloxView: InfobloxMaxResults:0 InfobloxFQDNRegEx: DynCustomerName: DynUsername: DynPassword: DynMinTTLSeconds:0 OCIConfigFile:/etc/kubernetes/oci.yaml InMemoryZones:[] OVHEndpoint:ovh-eu OVHApiRateLimit:20 PDNSServer:http://localhost:8081 PDNSAPIKey: PDNSTLSEnabled:false TLSCA: TLSClientCert: TLSClientCertKey: Policy:sync Registry:txt TXTOwnerID:eu-central-1:e2e-pr-5269-37 TXTPrefix: TXTSuffix: Interval:1m0s MinEventSyncInterval:5s Once:false DryRun:false UpdateEvents:false LogFormat:text MetricsAddress::7979 LogLevel:info TXTCacheInterval:0s TXTWildcardReplacement: ExoscaleEndpoint:https://api.exoscale.ch/dns ExoscaleAPIKey: ExoscaleAPISecret: CRDSourceAPIVersion:externaldns.k8s.io/v1alpha1 CRDSourceKind:DNSEndpoint ServiceTypeFilter:[] CFAPIEndpoint: CFUsername: CFPassword: RFC2136Host: RFC2136Port:0 RFC2136Zone: RFC2136Insecure:false RFC2136GSSTSIG:false RFC2136KerberosRealm: RFC2136KerberosUsername: RFC2136KerberosPassword: RFC2136TSIGKeyName: RFC2136TSIGSecret: RFC2136TSIGSecretAlg: RFC2136TAXFR:false RFC2136MinTTL:0s RFC2136BatchChangeSize:50 NS1Endpoint: NS1IgnoreSSL:false NS1MinTTLSeconds:0 TransIPAccountName: TransIPPrivateKeyFile: DigitalOceanAPIPageSize:50 ManagedDNSRecordTypes:[A CNAME] GoDaddyAPIKey: GoDaddySecretKey: GoDaddyTTL:0 GoDaddyOTE:false}"
time="2022-09-14T17:12:31Z" level=info msg="Instantiating new Kubernetes client"
time="2022-09-14T17:12:31Z" level=info msg="Using inCluster-config based on serviceaccount-token"
time="2022-09-14T17:12:31Z" level=info msg="Created Kubernetes client https://10.5.0.1:443"
time="2022-09-14T17:13:32Z" level=fatal msg="failed to sync cache: timed out waiting for the condition"
  1. vpa-admission-controller: container failed to create with no more information in pod logs or description.

The create cluster step succeeded anyway.

demonCoder95 avatar Sep 14 '22 17:09 demonCoder95

This should address the external-dns issue: https://github.com/zalando-incubator/kubernetes-on-aws/pull/5354

mikkeloscar avatar Sep 14 '22 17:09 mikkeloscar

Rebased on dev to remove the feature flag from kube-proxy-config.

demonCoder95 avatar Sep 15 '22 08:09 demonCoder95

e2e testing fails due to external-dns and vertical-pod-autoscaler issues. Those are being upgraded. Work will resume on this PR once those upgrades are finished.

vpa upgrade: https://github.bus.zalan.do/teapot/issues/issues/3328 external-dns upgrade: #5354

demonCoder95 avatar Sep 15 '22 09:09 demonCoder95

Upgraded to v1.22.14. Also, restored the kuberuntu version for v1.21 (which got downgraded mistakenly). This new rebase also includes the vertical-pod-autoscaler upgrade so expecting to not see the vpa cause issues now :)

demonCoder95 avatar Sep 16 '22 10:09 demonCoder95

The following pods in CrashLoopBackoff in the e2e test:

  1. deployment-status-service (3 pods) - getting OOMKilled.
  2. external-dns (1 pod) - this is expected to be resolved with the upgrade.
  3. kube-janitor (1 pod) - getting OOMKilled.

(1) seems to have healed itself, with time. Looking into kube-janitor OOM kill.

demonCoder95 avatar Sep 16 '22 12:09 demonCoder95

Retriggering seems to have fixed the transient issues observed with deployment-service-status-service and kube-janitor. Only the external-dns seems to be in CrashLoopBackoff, which is expected to be the case until the upgrade is complete. Putting this PR work on hold until external-dns upgrade is finished.

demonCoder95 avatar Sep 16 '22 16:09 demonCoder95

Rebased after external-dns upgrade to v0.12.2.

demonCoder95 avatar Sep 20 '22 08:09 demonCoder95

Failure in 5 upstream conformance tests:

20/09/2022 10:18:39 [91m[1mSummarizing 5 Failures:[0m
20/09/2022 10:18:39 
20/09/2022 10:18:39 [91m[1m[Fail] [0m[90m[sig-api-machinery] AdmissionWebhook [Privileged:ClusterAdmin] [0m[91m[1m[It] should mutate custom resource with different stored version [Conformance] [0m
20/09/2022 10:18:39 [37m/workspace/test/e2e/e2e_modules/kubernetes/test/e2e/apimachinery/webhook.go:1965[0m
20/09/2022 10:18:39 
20/09/2022 10:18:39 [91m[1m[Fail] [0m[90m[sig-api-machinery] CustomResourcePublishOpenAPI [Privileged:ClusterAdmin] [0m[91m[1m[It] works for multiple CRDs of different groups [Conformance] [0m
20/09/2022 10:18:39 [37m/workspace/test/e2e/e2e_modules/kubernetes/test/e2e/apimachinery/crd_publish_openapi.go:564[0m
20/09/2022 10:18:39 
20/09/2022 10:18:39 [91m[1m[Fail] [0m[90m[sig-api-machinery] CustomResourcePublishOpenAPI [Privileged:ClusterAdmin] [0m[91m[1m[It] works for CRD with validation schema [Conformance] [0m
20/09/2022 10:18:39 [37m/go/pkg/mod/github.com/onsi/[email protected]/internal/leafnodes/runner.go:113[0m
20/09/2022 10:18:39 
20/09/2022 10:18:39 [91m[1m[Fail] [0m[90m[sig-api-machinery] CustomResourcePublishOpenAPI [Privileged:ClusterAdmin] [0m[91m[1m[It] works for multiple CRDs of same group and version but different kinds [Conformance] [0m
20/09/2022 10:18:39 [37m/workspace/test/e2e/e2e_modules/kubernetes/test/e2e/apimachinery/crd_publish_openapi.go:564[0m
20/09/2022 10:18:39 
20/09/2022 10:18:39 [91m[1m[Fail] [0m[90m[sig-api-machinery] CustomResourcePublishOpenAPI [Privileged:ClusterAdmin] [0m[91m[1m[It] works for multiple CRDs of different groups [Conformance] [0m
20/09/2022 10:18:39 [37m/workspace/test/e2e/e2e_modules/kubernetes/test/e2e/apimachinery/crd_publish_openapi.go:564[0m
20/09/2022 10:18:39 
20/09/2022 10:18:39 [1m[91mRan 364 of 6441 Specs in 1175.241 seconds[0m
20/09/2022 10:18:39 [1m[91mFAIL![0m -- [32m[1m363 Passed[0m | [91m[1m1 Failed[0m | [33m[1m3 Flaked[0m | [33m[1m0 Pending[0m | [36m[1m6077 Skipped[0m

demonCoder95 avatar Sep 20 '22 10:09 demonCoder95

Taking a closer look at the failing e2e conformance tests by running them locally. https://github.com/zalando-incubator/kubernetes-on-aws/blob/dev/test/e2e/README.md#running-the-tests

demonCoder95 avatar Sep 20 '22 12:09 demonCoder95

Pipeline is 🟢 for the first time! 😮

demonCoder95 avatar Oct 06 '22 10:10 demonCoder95

Rebased the PR on latest dev.

demonCoder95 avatar Oct 17 '22 08:10 demonCoder95

Updated to AMI based on Kubernetes v1.22.16, also rebased on dev.

mikkeloscar avatar Nov 28 '22 10:11 mikkeloscar

:+1:

gargravarr avatar Feb 20 '23 14:02 gargravarr

:+1:

mikkeloscar avatar Feb 20 '23 14:02 mikkeloscar