kops export --internal gives error on 1.26.3
/kind bug
1. What kops version are you running? The command kops version, will display
this information.
1.26.3
2. What Kubernetes version are you running? kubectl version will print the
version if a cluster is running or provide the Kubernetes version specified as
a kops flag.
1.26.3
3. What cloud provider are you using?
aws
4. What commands did you run? What is the simplest way to reproduce this issue?
kops export kubecfg
E0612 11:43:20.331105 19527 memcache.go:265] couldn't get current server API group list: Get "https://api.internal.testcluster-api.control.example.com:8443/api?timeout=32s": dial tcp 192.168.0.98:8443: connect: connection refused
6. What did you expect to happen?
able to get pods on cluster
7. Please provide your cluster manifest. Execute
kops get --name my.example.com -o yaml to display your cluster manifest.
You may want to remove your cluster name and other sensitive information.
apiVersion: kops.k8s.io/v1alpha2
kind: Cluster
metadata:
creationTimestamp: "2023-06-05T14:54:56Z"
name: testcluster-api.control.example.com
spec:
api:
loadBalancer:
class: Network
sslCertificate: arn:aws:acm:eu-east-6:12345678:certificate/46456456456-erg45t54-45645g45-343
type: Internal
authorization:
rbac: {}
channel: stable
cloudLabels:
Owner: testcluster
kops: testcluster.control.example.com
cloudProvider: aws
configBase: s3://testcluster/testcluster.control.example.com
containerRuntime: containerd
encryptionConfig: true
iam:
legacy: false
kubeControllerManager:
enableProfiling: false
featureGates:
RotateKubeletServerCertificate: "true"
terminatedPodGCThreshold: 10
kubeScheduler:
enableProfiling: false
kubernetesVersion: v1.26.3
masterPublicName: testcluster-api.control.example.com
8. Please run the commands with most verbose logging by adding the -v 10 flag.
Paste the logs into this report, or in a gist and provide the gist link here.
9. Anything else do we need to know? we change the masterPublicName in config
Please explain the problem in detail:
- Which is the version of kOps it last worked with?
- What is the difference between the kubeconfig generated by the working version versus the one generated by 1.26.3?
- What is IP 192.168.0.98?
- How are you trying to connect to 192.168.0.98 (internal network)?
- ....
last worked version was 1.25.4 Ip is just fake ip Cluster was created with 1.26.3 binary and then the config was exported with 1.25.4. binary. In that case it worked, so in config there were no changes.
When I tried to export with 1.26.3 I got this error
Cluster was created with 1.26.3 binary and then the config was exported with 1.25.4. binary. In that case it worked, so in config there were no changes.
What is the difference in ~/.kube/config when exported with 1.25.4 vs 1.26.3?
the only difference is :
on 1.25.4 the config contains:
clusters:
- cluster:
server: https://internal.cluster-api.control.example.com:8443
on 1.26.3
clusters:
- cluster:
server: https://api.internal.cluster-api.control.example.com8443
Where did https://internal.cluster-api.control.example.com:8443 come from?
Do you also change MasterInternalName somewhere?
In config we set the masterInternal and masterPublic, but when I do the kops update cluster I don't see the internal one on the final config.
I still don't understand where you get the https://internal.cluster-api.control.example.com:8443.
The internal address of the API server is api.internal.<CLUSTER_NAME>.
with ansible we set the following lines after we created the cluster config with kops create cluster
masterPublicName: https://cluster-api.control.example.com:8443
masterInternalName: https://internal.cluster-api.control.example.com:8443
after that a kops update cluster set all settings
As of 1.26, changing masterInternalName is no longer allowed. kops export kubecfg --admin --internal will always set server: https://api.internal.cluster-api.control.example.com.
Please try to use additionalSANs instead, if that's possible with your workflow.
https://pkg.go.dev/k8s.io/kops/pkg/apis/kops#APISpec
last time I tried that one but had a problem with export... maybe on that time we could not export using sans address only ?
is that possible? our real goal is to have api access w/o LB only using nodeports
kOps will only set and update set the DNS record for api.internal.cluster-api.control.example.com. It is a subdomain of what you are already using, so should work just fine. Not sure if there's other logic in play though.
https://github.com/kubernetes/kops/blob/feedb1b2bbc07ed151a1ac6f0372cbb36358ea65/nodeup/pkg/model/kube_apiserver.go#L759-L761
api subdomain would be ok for us as well, but as I see in export it exports the dns of the LB and not the api.internal.cluster-api.control.example.com with port 8443. Any idea?
Do you also set UseForInternalAPI somewhere?
no. probably that is the missing part ?
Please paste the cluster spec with all the fields. I don't think what's in the main comment is full. For example topology is missing.
The create cluster command would also be useful.
config:
apiVersion: kops.k8s.io/v1alpha2
kind: Cluster
metadata:
creationTimestamp: "2023-06-21T08:22:34Z"
name: testcluster-api.control.example.com
spec:
rollingUpdate:
maxSurge: 50%
kubeControllerManager:
enableProfiling: false
terminatedPodGCThreshold: 10
featureGates:
RotateKubeletServerCertificate: "true"
kubeScheduler:
enableProfiling: false
kubeAPIServer:
enableProfiling: false
auditLogMaxAge: 30
auditLogMaxBackups: 10
auditLogMaxSize: 100
requestTimeout: 3m0s
auditLogPath: /var/log/apiserver/audit.log
auditPolicyFile: /srv/kubernetes/kube-apiserver/audit-policy-config.yaml
admissionControlConfigFile: /srv/kubernetes/kube-apiserver/admission-configuration.yaml
appendAdmissionPlugins:
- EventRateLimit
masterPublicName: testcluster-api.control.example.com
masterInternalName: internal.testcluster-api.control.example.com
assets:
containerProxy: "external-docker-registry"
additionalPolicies:
master: |
[
{
"Effect": "Allow",
"Action": [
"kms:Decrypt",
"kms:Encrypt"
],
"Resource": [
"aws_arn_key"
]
}
]
fileAssets:
- name: aws-encryption-provider.yaml
path: /etc/kubernetes/manifests/aws-encryption-provider.yaml
roles:
- ControlPlane
content: |
apiVersion: v1
kind: Pod
metadata:
annotations:
scheduler.alpha.kubernetes.io/critical-pod: ""
labels:
k8s-app: aws-encryption-provider
name: aws-encryption-provider
namespace: kube-system
spec:
# since runs on a master node, pull-secret needs to be available immediately.
# we're using the kops create secret dockerconfig. alternatively, we could use
# a public image
containers:
- image: external-docker-registry/aws-encryption-provider:tag
name: aws-encryption-provider
command:
- /aws-encryption-provider
- --key=aws_arn_key
- --region=aws_region
- --listen=/srv/kubernetes/kube-apiserver/socket.sock
- --health-port=:8083
ports:
- containerPort: 8083
protocol: TCP
livenessProbe:
httpGet:
path: /healthz
port: 8083
volumeMounts:
- mountPath: /srv/kubernetes/kube-apiserver
name: kmsplugin
hostNetwork: true
priorityClassName: system-cluster-critical
# we're using this volume for other files too
volumes:
- name: kmsplugin
hostPath:
path: /srv/kubernetes/kube-apiserver
type: DirectoryOrCreate
- content: |
apiVersion: eventratelimit.admission.k8s.io/v1alpha1
kind: Configuration
limits:
- type: Namespace
qps: 50
burst: 100
cacheSize: 2000
- type: User
qps: 10
burst: 50
name: event-rate-configuration
path: /srv/kubernetes/kube-apiserver/eventconfig.yaml
roles:
- ControlPlane
- content: |
apiVersion: apiserver.config.k8s.io/v1
kind: AdmissionConfiguration
plugins:
- name: EventRateLimit
path: /srv/kubernetes/kube-apiserver/eventconfig.yaml
name: admission-configuration
path: /srv/kubernetes/kube-apiserver/admission-configuration.yaml
roles:
- ControlPlane
- name: audit-policy-config
path: /srv/kubernetes/kube-apiserver/audit-policy-config.yaml
roles:
- ControlPlane
content: |
apiVersion: audit.k8s.io/v1 # This is required.
kind: Policy
# Don't generate audit events for all requests in RequestReceived stage.
omitStages:
- "RequestReceived"
rules:
# Log pod changes at RequestResponse level
- level: RequestResponse
resources:
- group: ""
# Resource "pods" doesn't match requests to any subresource of pods,
# which is consistent with the RBAC policy.
resources: ["pods", "deployments"]
- level: RequestResponse
resources:
- group: "rbac.authorization.k8s.io"
resources: ["clusterroles", "clusterrolebindings"]
# Log "pods/log", "pods/status" at Metadata level
- level: Metadata
resources:
- group: ""
resources: ["pods/log", "pods/status"]
# Don't log requests to a configmap called "controller-leader"
- level: None
resources:
- group: ""
resources: ["configmaps"]
resourceNames: ["controller-leader"]
# Don't log watch requests by the "system:kube-proxy" on endpoints or services
- level: None
users: ["system:kube-proxy"]
verbs: ["watch"]
resources:
- group: "" # core API group
resources: ["endpoints", "services"]
# Don't log authenticated requests to certain non-resource URL paths.
- level: None
userGroups: ["system:authenticated"]
nonResourceURLs:
- "/api*" # Wildcard matching.
- "/version"
# Log the request body of configmap changes in kube-system.
- level: Request
resources:
- group: "" # core API group
resources: ["configmaps"]
# This rule only applies to resources in the "kube-system" namespace.
# The empty string "" can be used to select non-namespaced resources.
namespaces: ["kube-system"]
# Log configmap changes in all other namespaces at the RequestResponse level.
- level: RequestResponse
resources:
- group: "" # core API group
resources: ["configmaps"]
# Log secret changes in all namespaces at the Metadata level.
- level: Metadata
resources:
- group: "" # core API group
resources: ["secrets"]
# Log all other resources in core and extensions at the Request level.
- level: Request
resources:
- group: "" # core API group
- group: "extensions" # Version of group should NOT be included.
# A catch-all rule to log all other requests at the Metadata level.
- level: Metadata
# Long-running requests like watches that fall under this rule will not
# generate an audit event in RequestReceived.
omitStages:
- "RequestReceived"
encryptionConfig: true
api:
loadBalancer:
class: Network
sslCertificate: aws_ssl_cert
type: Internal
UseForInternalAPI: true
authorization:
rbac: {}
channel: stable
cloudLabels:
Owner: company
delete: ""
kops: testcluster-api.control.example.com
cloudProvider: aws
configBase: s3://config/testcluster-api.control.example.com
containerRuntime: containerd
etcdClusters:
- cpuRequest: 200m
etcdMembers:
- encryptedVolume: true
kmsKeyId: aws_kms_key_id
instanceGroup: control-plane-aws_region
name: a
memoryRequest: 100Mi
manager:
env:
- name: ETCD_MANAGER_HOURLY_BACKUPS_RETENTION
value: 60d
name: main
- cpuRequest: 100m
etcdMembers:
- encryptedVolume: true
kmsKeyId: aws_kms_key_id
instanceGroup: control-plane-aws_region
name: a
memoryRequest: 100Mi
manager:
env:
- name: ETCD_MANAGER_HOURLY_BACKUPS_RETENTION
value: 60d
name: events
iam:
legacy: false
kubelet:
anonymousAuth: false
evictionMaxPodGracePeriod: 120
evictionSoftGracePeriod: memory.available=30s,imagefs.available=30s,nodefs.available=30s,imagefs.inodesFree=30s,nodefs.inodesFree=30s
evictionSoft: memory.available<250Mi,nodefs.available<15%,nodefs.inodesFree<10%,imagefs.available<20%,imagefs.inodesFree<10%
evictionHard: memory.available<100Mi,nodefs.available<10%,nodefs.inodesFree<5%,imagefs.available<15%,imagefs.inodesFree<5%
readOnlyPort: 0
eventQPS: 0
featureGates: { RotateKubeletServerCertificate: "true" }
kernelMemcgNotification: true
protectKernelDefaults: true
authorizationMode: Webhook
authenticationTokenWebhook: true
kubernetesApiAccess:
- 192.168.0.0/24
kubernetesVersion: v1.26.3
networkCIDR: 192.168.2/23
networkID: vpc-34567juytgrfed
networking:
calico: {}
nonMasqueradeCIDR: 100.64.0.0/10
sshAccess:
- 192.168.0.0/24
subnets:
- cidr: 192.168.2.0/26
id: subnet-040987b99943b6cec
name: aws_region
type: Private
zone: aws_region
- cidr: 192.168.2.192/26
id: subnet-091cac4d38a2ee136
name: utility-aws_region
type: Utility
zone: aws_region
topology:
dns:
type: Private
masters: private
nodes: private
cluster.spec:
apiVersion: kops.k8s.io/v1alpha2
kind: Cluster
metadata:
creationTimestamp: "2023-06-21T08:22:34Z"
name: testcluster-api.control.example.com
spec:
additionalPolicies:
master: |
[
{
"Effect": "Allow",
"Action": [
"kms:Decrypt",
"kms:Encrypt"
],
"Resource": [
"aws_arn_key"
]
}
]
api:
loadBalancer:
class: Network
sslCertificate: aws_ssl_cert
type: Internal
assets:
containerProxy: external-docker-registry
authorization:
rbac: {}
channel: stable
cloudConfig:
awsEBSCSIDriver:
enabled: true
version: v1.14.1
manageStorageClasses: true
cloudControllerManager:
allocateNodeCIDRs: true
clusterCIDR: 100.64.0.0/10
clusterName: testcluster-api.control.example.com
configureCloudRoutes: false
image: registry.k8s.io/provider-aws/cloud-controller-manager:v1.26.0
leaderElection:
leaderElect: true
cloudLabels:
Owner: company
delete: ""
kops: testcluster-api.control.example.com
cloudProvider: aws
clusterDNSDomain: cluster.local
configBase: s3://config/testcluster-api.control.example.com
configStore: s3://config/testcluster-api.control.example.com
containerRuntime: containerd
containerd:
logLevel: info
runc:
version: 1.1.4
version: 1.6.18
dnsZone: aws_dns_zone
docker:
skipInstall: true
encryptionConfig: true
etcdClusters:
- backups:
backupStore: s3://config/testcluster-api.control.example.com/backups/etcd/main
cpuRequest: 200m
etcdMembers:
- encryptedVolume: true
instanceGroup: control-plane-aws-region
kmsKeyId: aws_kms_key_id
name: a
manager:
env:
- name: ETCD_MANAGER_HOURLY_BACKUPS_RETENTION
memoryRequest: 100Mi
name: main
version: 3.5.7
- backups:
backupStore: s3://config/testcluster-api.control.example.com/backups/etcd/events
cpuRequest: 100m
etcdMembers:
- encryptedVolume: true
instanceGroup: control-plane-aws_region
kmsKeyId: aws_kms_key_id
name: a
manager:
env:
- name: ETCD_MANAGER_HOURLY_BACKUPS_RETENTION
memoryRequest: 100Mi
name: events
version: 3.5.7
externalDns:
provider: dns-controller
fileAssets:
- content: |
apiVersion: v1
kind: Pod
metadata:
annotations:
scheduler.alpha.kubernetes.io/critical-pod: ""
labels:
k8s-app: aws-encryption-provider
name: aws-encryption-provider
namespace: kube-system
spec:
# since runs on a master node, pull-secret needs to be available immediately.
# we're using the kops create secret dockerconfig. alternatively, we could use
# a public image
containers:
- image: external-docker_registry/aws-encryption-provider:tag
name: aws-encryption-provider
command:
- /aws-encryption-provider
- --key=aws_arn_key
- --region=us-east-2
- --listen=/srv/kubernetes/kube-apiserver/socket.sock
- --health-port=:8083
ports:
- containerPort: 8083
protocol: TCP
livenessProbe:
httpGet:
path: /healthz
port: 8083
volumeMounts:
- mountPath: /srv/kubernetes/kube-apiserver
name: kmsplugin
hostNetwork: true
priorityClassName: system-cluster-critical
# we're using this volume for other files too
volumes:
- name: kmsplugin
hostPath:
path: /srv/kubernetes/kube-apiserver
type: DirectoryOrCreate
name: aws-encryption-provider.yaml
path: /etc/kubernetes/manifests/aws-encryption-provider.yaml
roles:
- ControlPlane
- content: |
apiVersion: eventratelimit.admission.k8s.io/v1alpha1
kind: Configuration
limits:
- type: Namespace
qps: 50
burst: 100
cacheSize: 2000
- type: User
qps: 10
burst: 50
name: event-rate-configuration
path: /srv/kubernetes/kube-apiserver/eventconfig.yaml
roles:
- ControlPlane
- content: |
apiVersion: apiserver.config.k8s.io/v1
kind: AdmissionConfiguration
plugins:
- name: EventRateLimit
path: /srv/kubernetes/kube-apiserver/eventconfig.yaml
name: admission-configuration
path: /srv/kubernetes/kube-apiserver/admission-configuration.yaml
roles:
- ControlPlane
- content: |
apiVersion: audit.k8s.io/v1 # This is required.
kind: Policy
# Don't generate audit events for all requests in RequestReceived stage.
omitStages:
- "RequestReceived"
rules:
# Log pod changes at RequestResponse level
- level: RequestResponse
resources:
- group: ""
# Resource "pods" doesn't match requests to any subresource of pods,
# which is consistent with the RBAC policy.
resources: ["pods", "deployments"]
- level: RequestResponse
resources:
- group: "rbac.authorization.k8s.io"
resources: ["clusterroles", "clusterrolebindings"]
# Log "pods/log", "pods/status" at Metadata level
- level: Metadata
resources:
- group: ""
resources: ["pods/log", "pods/status"]
# Don't log requests to a configmap called "controller-leader"
- level: None
resources:
- group: ""
resources: ["configmaps"]
resourceNames: ["controller-leader"]
# Don't log watch requests by the "system:kube-proxy" on endpoints or services
- level: None
users: ["system:kube-proxy"]
verbs: ["watch"]
resources:
- group: "" # core API group
resources: ["endpoints", "services"]
# Don't log authenticated requests to certain non-resource URL paths.
- level: None
userGroups: ["system:authenticated"]
nonResourceURLs:
- "/api*" # Wildcard matching.
- "/version"
# Log the request body of configmap changes in kube-system.
- level: Request
resources:
- group: "" # core API group
resources: ["configmaps"]
# This rule only applies to resources in the "kube-system" namespace.
# The empty string "" can be used to select non-namespaced resources.
namespaces: ["kube-system"]
# Log configmap changes in all other namespaces at the RequestResponse level.
- level: RequestResponse
resources:
- group: "" # core API group
resources: ["configmaps"]
# Log secret changes in all namespaces at the Metadata level.
- level: Metadata
resources:
- group: "" # core API group
resources: ["secrets"]
# Log all other resources in core and extensions at the Request level.
- level: Request
resources:
- group: "" # core API group
- group: "extensions" # Version of group should NOT be included.
# A catch-all rule to log all other requests at the Metadata level.
- level: Metadata
# Long-running requests like watches that fall under this rule will not
# generate an audit event in RequestReceived.
omitStages:
- "RequestReceived"
name: audit-policy-config
path: /srv/kubernetes/kube-apiserver/audit-policy-config.yaml
roles:
- ControlPlane
iam:
legacy: false
keyStore: s3://config/testcluster-api.control.example.com/pki
kubeAPIServer:
admissionControlConfigFile: /srv/kubernetes/kube-apiserver/admission-configuration.yaml
allowPrivileged: true
anonymousAuth: false
apiAudiences:
- kubernetes.svc.default
apiServerCount: 1
appendAdmissionPlugins:
- EventRateLimit
auditLogMaxAge: 30
auditLogMaxBackups: 10
auditLogMaxSize: 100
auditLogPath: /var/log/apiserver/audit.log
auditPolicyFile: /srv/kubernetes/kube-apiserver/audit-policy-config.yaml
authorizationMode: Node,RBAC
bindAddress: 0.0.0.0
cloudProvider: external
enableAdmissionPlugins:
- NamespaceLifecycle
- LimitRanger
- ServiceAccount
- DefaultStorageClass
- DefaultTolerationSeconds
- MutatingAdmissionWebhook
- ValidatingAdmissionWebhook
- NodeRestriction
- ResourceQuota
- EventRateLimit
enableProfiling: false
etcdServers:
- https://127.0.0.1:4001
etcdServersOverrides:
- /events#https://127.0.0.1:4002
featureGates:
CSIMigrationAWS: "true"
InTreePluginAWSUnregister: "true"
image: external-docker-registry/kube-apiserver:v1.26.3
kubeletPreferredAddressTypes:
- InternalIP
- Hostname
- ExternalIP
logLevel: 2
requestTimeout: 3m0s
requestheaderAllowedNames:
- aggregator
requestheaderExtraHeaderPrefixes:
- X-Remote-Extra-
requestheaderGroupHeaders:
- X-Remote-Group
requestheaderUsernameHeaders:
- X-Remote-User
securePort: 443
serviceAccountIssuer: https://api.internal.testcluster-api.control.example.com
serviceAccountJWKSURI: https://api.internal.testcluster-api.control.example.com/openid/v1/jwks
serviceClusterIPRange: 100.64.0.0/13
storageBackend: etcd3
kubeControllerManager:
allocateNodeCIDRs: true
attachDetachReconcileSyncPeriod: 1m0s
cloudProvider: external
clusterCIDR: 100.96.0.0/11
clusterName: testcluster-api.control.example.com
configureCloudRoutes: false
enableProfiling: false
featureGates:
CSIMigrationAWS: "true"
InTreePluginAWSUnregister: "true"
RotateKubeletServerCertificate: "true"
image: external-docker-registry/kube-controller-manager:v1.26.3
leaderElection:
leaderElect: true
logLevel: 2
terminatedPodGCThreshold: 10
useServiceAccountCredentials: true
kubeDNS:
cacheMaxConcurrent: 150
cacheMaxSize: 1000
cpuRequest: 100m
domain: cluster.local
memoryLimit: 170Mi
memoryRequest: 70Mi
nodeLocalDNS:
cpuRequest: 25m
enabled: false
image: registry.k8s.io/dns/k8s-dns-node-cache:1.22.20
memoryRequest: 5Mi
provider: CoreDNS
serverIP: 100.64.0.10
kubeProxy:
clusterCIDR: 100.96.0.0/11
cpuRequest: 100m
image: external-docker-registry/kube-proxy:v1.26.3
logLevel: 2
kubeScheduler:
enableProfiling: false
featureGates:
CSIMigrationAWS: "true"
InTreePluginAWSUnregister: "true"
image: external-docker-registry/kube-scheduler:v1.26.3
leaderElection:
leaderElect: true
logLevel: 2
kubelet:
anonymousAuth: false
authenticationTokenWebhook: true
authorizationMode: Webhook
cgroupDriver: systemd
cgroupRoot: /
cloudProvider: external
clusterDNS: 100.64.0.10
clusterDomain: cluster.local
enableDebuggingHandlers: true
eventQPS: 0
evictionHard: memory.available<100Mi,nodefs.available<10%,nodefs.inodesFree<5%,imagefs.available<15%,imagefs.inodesFree<5%
evictionMaxPodGracePeriod: 120
evictionSoft: memory.available<250Mi,nodefs.available<15%,nodefs.inodesFree<10%,imagefs.available<20%,imagefs.inodesFree<10%
evictionSoftGracePeriod: memory.available=30s,imagefs.available=30s,nodefs.available=30s,imagefs.inodesFree=30s,nodefs.inodesFree=30s
featureGates:
CSIMigrationAWS: "true"
InTreePluginAWSUnregister: "true"
RotateKubeletServerCertificate: "true"
kernelMemcgNotification: true
kubeconfigPath: /var/lib/kubelet/kubeconfig
logLevel: 2
podInfraContainerImage: external-docker_registry
podManifestPath: /etc/kubernetes/manifests
protectKernelDefaults: true
readOnlyPort: 0
registerSchedulable: true
shutdownGracePeriod: 30s
shutdownGracePeriodCriticalPods: 10s
kubernetesApiAccess:
- 192.168.0.0/24
kubernetesVersion: 1.26.3
masterKubelet:
anonymousAuth: false
authenticationTokenWebhook: true
authorizationMode: Webhook
cgroupDriver: systemd
cgroupRoot: /
cloudProvider: external
clusterDNS: 100.64.0.10
clusterDomain: cluster.local
enableDebuggingHandlers: true
eventQPS: 0
evictionHard: memory.available<100Mi,nodefs.available<10%,nodefs.inodesFree<5%,imagefs.available<15%,imagefs.inodesFree<5%
evictionMaxPodGracePeriod: 120
evictionSoft: memory.available<250Mi,nodefs.available<15%,nodefs.inodesFree<10%,imagefs.available<20%,imagefs.inodesFree<10%
evictionSoftGracePeriod: memory.available=30s,imagefs.available=30s,nodefs.available=30s,imagefs.inodesFree=30s,nodefs.inodesFree=30s
featureGates:
CSIMigrationAWS: "true"
InTreePluginAWSUnregister: "true"
RotateKubeletServerCertificate: "true"
kernelMemcgNotification: true
kubeconfigPath: /var/lib/kubelet/kubeconfig
logLevel: 2
podInfraContainerImage: external-docker-registry/pause:3.6
podManifestPath: /etc/kubernetes/manifests
protectKernelDefaults: true
readOnlyPort: 0
registerSchedulable: true
shutdownGracePeriod: 30s
shutdownGracePeriodCriticalPods: 10s
masterPublicName: testcluster-api.control.example.com
networkCIDR: 192.168.2.0/23
networkID: vpc-0bc336cc29fb4b2a7
networking:
calico:
encapsulationMode: ipip
nonMasqueradeCIDR: 100.64.0.0/10
podCIDR: 100.96.0.0/11
rollingUpdate:
maxSurge: 50%
secretStore: s3://config/testcluster-api.control.example.com/secrets
serviceClusterIPRange: 100.64.0.0/13
sshAccess:
- 192.168.0.0/24
subnets:
- cidr: 192.168.2.0/26
id: subnet-040987b99943b6cec
name: aws_region
type: Private
zone: aws_region
- cidr: 192.168.2.192/26
id: subnet-091cac4d38a2ee136
name: utility-aws_region
type: Utility
zone: aws_region
topology:
dns:
type: Private
masters: private
nodes: private
I tested with a cluster using this command (which should generate a similar cluster):
$ kops-1.27.0-beta.3 --name my.k8s create cluster --cloud=aws --zones eu-central-1a \
--control-plane-size t3.medium --node-size t3.medium --networking calico \
--dns=private --topology=private --api-loadbalancer-type internal
The dns.alpha.kubernetes.io/internal: api.internal.my.k8s in present in the kube-apiserver pod.
Running
$ kops-1.27.0-beta.3 --name my.k8s export kubeconfig --admin --internal
$ cat ~/.kube/config
...
- cluster:
certificate-authority-data: ...
server: https://api.internal.my.k8s
tls-server-name: api.internal.my.k8s
name: my.k8s
I can also see the following DNS records:
- api.my.k8s - A - alias to LB
- api.my.k8s - AAAA - alias to LB
- api.internal.my.k8s - A - Simple - IP of control-plane node
- kops-controller.internal.my.k8s - A - Simple - IP of control-plane node
Looks to me like api.internal.my.k8s is the intended value, the IP of the API server.
would you plan to revert changes in newer versions or add an option to use old format ?
would you plan to revert changes in newer versions or add an option to use old format ?
What do you mean?
@hakman - we are getting an error with SSL validation for api.internal.cluster_fqdn, because there is no Certificate Subject Alternative name generated for api.internal.${cluster_fqdn} in kubernetes-ca->kubernetes-master.
here is the error note: real cluster name replaced with ${cluster_fqdn}
15:39:54 E0105 14:39:54.417672 1672 memcache.go:265] couldn't get current server API group list:
Get "https://api-f27cb3e63-int-2riv26-1297bd82f345d86a.elb.eu-central-1.amazonaws.com:8443/api?timeout=32s":
tls: failed to verify certificate: x509: certificate is valid for kubernetes, kubernetes.default,
kubernetes.default.svc, kubernetes.default.svc.cluster.local, ${cluster_fqdn}, internal-${cluster_fqdn},
api-f27cb3e63-int-2riv26-1297bd82f345d86a.elb.eu-central-1.amazonaws.com, ->>>> not api.internal.${cluster_fqdn}
We are using AWS, Network LB
api:
loadBalancer:
type: Internal
class: Network
additionalSecurityGroups: ["{{.k8s_api_sg_id.value}}"]
sslCertificate: {{.cluster_certificate_arn.value}}
......
......
masterInternalName: internal-{{.cluster_fqdn.value}}
So in previous versions of kops, we were not getting SSL validation errors, since kubernetes-ca contained internal-{{.cluster_fqdn.value}}
I can see all SSL Alternative names by by opening URL https://api-f27cb3e63-int-2riv26-1297bd82f345d86a.elb.eu-central-1.amazonaws.com:8443/api?timeout=32s.
so, In the new version, there are no SSL Alternative names for "api.internal.${cluster_fqdn}"
Perhaps you have another field to add api.internal.${cluster_fqdn}. in the kubernetes-ca->kubernetes-master Alternative name?
@aramhakobyan Did you update and roll all the nodes and still seeing internal-${cluster_fqdn} and not api.internal.${cluster_fqdn}?
@hakman - well "seeing" we see, the problem is not that api.internal.${cluster_fqdn} DNS record is not generated, but the problem is that api.interna.${cluster_fqdn} is not in kubernetes-ca->kubernetes-master Alternative name
Order of upgrade from 1.25
kops export kubecfg --admin
kops toolbox template --values cluster-config.json --values instance-groups.yaml --values core-instance-groups.yaml --values ../terraform/values.json --template cluster-template.yaml --snippets snippets --format-yaml
kops replace --force --filename cluster.yaml
kops update cluster --yes --lifecycle-overrides
kops export kubecfg --admin
kubectl get deployment .... -> getting cert error
When I check certs by opening
https://api-f27cb3e63-int-2riv26-1297bd82f345d86a.elb.eu-central-1.amazonaws.com:8443/api?timeout=32s.
@aramhakobyan What kops binary version are you using?
You should not be able to use masterInternalName anymore and should not be in the Alternative Names.
Order of upgrade from 1.25...
@aramhakobyan I think you are missing a kops rolling-update cluster --yes :
https://kops.sigs.k8s.io/cli/kops_rolling-update_cluster/
Order of upgrade from 1.25...
@aramhakobyan I think you are missing a
kops rolling-update cluster --yes: https://kops.sigs.k8s.io/cli/kops_rolling-update_cluster/
It was not mentioned but we run kops rolling-update cluster --yes
What we see is during this rolling update once old master instance is removed it's IP is also removed from Route53 record internal-${cluster_fqdn} by kops. Once the last old master is gone the record is deleted. After that all remaining worker nodes become not ready as they still use this host internal-${cluster_fqdn} to talk to masters.
I tried to create manually CNAME internal-${cluster_fqdn} pointing to api.internal.${cluster_fqdn} after it was deleted by kops, and it would help, but the problem is that even after adding
apiVersion: kops.k8s.io/v1alpha2
kind: Cluster
spec:
api:
additionalSANs:
- internal-{{.cluster_fqdn}}
this SAN is not added to kops generated cert and all old worker nodes complain like this
certificate is valid for kubernetes, kubernetes.default, kubernetes.default.svc, kubernetes.default.svc.cluster.local, ${cluster_fqdn}, api.internal.${cluster_fqdn}, api-f9672c8f3-ipsk8st-int-084l9p-51f345d8f65d087c.elb.eu-central-1.amazonaws.com, not internal-${cluster_fqdn}
Are we doing something wrong? Is there a way out without downtime and --cloudonly flag?
Also cluster-completed.spec file from S3 cluster state bucket does not have
spec:
api:
additionalSANs:
- internal-{{.cluster_fqdn}}
So it looks like it is dropped during some conversion?
Changing to
spec:
additionalSANs:
- internal-{{.cluster_fqdn}}
made kops add the needed SAN.
Is there a way to make kops keep old record with new masters IP addresses i.e. to have both:
internal-{{.cluster_fqdn}}
api.internal.{{.cluster_fqdn}}
?
Adding CNAME in the middle of rolling update is not something we want to do :) I did it manually just to validate the theory.
@hakman - could you please check the last comments from @gerasym ?
The Kubernetes project currently lacks enough contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied - After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied - After 30d of inactivity since
lifecycle/rottenwas applied, the issue is closed
You can:
- Mark this issue as fresh with
/remove-lifecycle stale - Close this issue with
/close - Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale