Flatcar: hostname -f Does Not Return FQDN, Causing Kubelet Identity Mismatch in Kubernetes
Description
On Flatcar Linux Flatcar-stable-4230.2.0-hvm and further alpha versions, the default hostname configuration does not return a fully qualified domain name (FQDN) when using hostname -f. Instead, it returns only the short hostname (e.g., ip-10-54-69-123), which leads to issues in Kubernetes where the kubelet expects to identify itself using the FQDN.
Impact
This causes the kubelet to register with a short hostname, while Kubernetes and related RBAC policies expect the fully qualified hostname, resulting in authentication errors and node registration failures.
Environment and steps to reproduce
1 . Set-up:
Flatcar Linux (tested with stable channel)
Running on AWS EC2 with the ami published yesterday
Kubelet launched with hostname configuration relying on hostname -f
Using cloud-init or ignition to configure /etc/kubernetes/nodeip.conf
- Task: Start a Flatcar instance and run a Kubernetes kubelet.
Action(s):
a. Boot Flatcar node
b. Kubelet reads KUBELET_HOSTNAME=$(hostname -f)
c. Kubelet uses short hostname to authenticate (e.g., system:node:ip-10-54-69-123)
d. Kubernetes API expects FQDN (e.g., ip-10-54-69-123.<region>.xxx.zzzz)
e. Authentication fails
Error:
Failed to list *v1.Node: nodes "ip-10-54-69-123.<region>.xxx.zzzz" is forbidden: User "system:node:ip-10-54-69-123" cannot list resource "nodes"
Expected behavior
hostname -f should return the fully qualified domain name (e.g., ip-10-54-69-123.
Additional information
As a workaround, explicitly setting the FQDN in KUBELET_HOSTNAME using cloud metadata (e.g., /run/metadata/coreos) resolves the issue.
@tormath1 this in general is the same for all the providers. I tried this on other cloud providers as well and we get the short hostname instead of the FQDN hostname. Although, for Kubernetes AWS is the only one that i'm aware of that has a hard requirement for the hostname to be the FQDN hostname.
You might get a node that joins your cluster and is ready initially but the CCM will be unable to locate the the node in AWS and it'll have the taint node.cloudprovider.kubernetes.io/uninitialized making it not schedulable.
taints:
- effect: NoSchedule
key: node.cloudprovider.kubernetes.io/uninitialized
value: "true"
- effect: NoSchedule
key: node.kubernetes.io/unreachable
Hello @ealogar and thanks for the feedback and the detailed issue. This is a side-effect of the systemd upgrade (from 255 to 256), you can get back to the normal behavior with this drop-in (manual intervention or via Butane/Ignition):
# /etc/systemd/system/systemd-resolved.service.d/override.conf
[Service]
Environment=SYSTEMD_RESOLVED_SYNTHESIZE_HOSTNAME=false
or
variant: flatcar
version: 1.1.0
systemd:
units:
- name: systemd-resolved.service
dropins:
- name: hostname-fqdn.conf
contents: |
[Service]
Environment=SYSTEMD_RESOLVED_SYNTHESIZE_HOSTNAME=false
(ref: https://github.com/systemd/systemd/blob/a7bfb9f76b96888d60b4f287f29dcbf758ba34c0/docs/ENVIRONMENT.md?plain=1#L384-L386)
Let us know if if works.
@tormath1: systemd-resolved is not the default resolver on Flatcar is it? (https://www.flatcar.org/docs/latest/setup/customization/configuring-dns/)
Do our kola tests on AWS catch this issue?
systemd-resolved is the default resolver according to the documentation:
By default, DNS resolution on Flatcar Container Linux is handled through /etc/resolv.conf, which is a symlink to /run/systemd/resolve/resolv.conf. This file is managed by systemd-resolved.
On AWS instance, strace of hostname -f returns:
connect(3, {sa_family=AF_UNIX, sun_path="/run/systemd/resolve/io.systemd.Resolve"}, 42) = 0
sendto(3, "{\"method\":\"io.systemd.Resolve.ResolveHostname\",\"parameters\":{\"name\":\"ip-1.2.3.4\",\"family\":2,\"flags\":0}}\0"
There is a call to the io.systemd.Resolve socket, and calling manually this method gives:
$ systemctl show -p Environment systemd-resolved
Environment=SYSTEMD_RESOLVED_SYNTHESIZE_HOSTNAME=false
$ varlinkctl call /run/systemd/resolve/io.systemd.Resolve io.systemd.Resolve.ResolveHostname '{"name":"ip-1.2.3.4","family":2}' -j | jq .name
"ip-1.2.3.4.eu-central-1.compute.internal"
Now, with the default boolean:
$ systemctl show -p Environment systemd-resolved
Environment=SYSTEMD_RESOLVED_SYNTHESIZE_HOSTNAME=true
$ varlinkctl call /run/systemd/resolve/io.systemd.Resolve io.systemd.Resolve.ResolveHostname '{"name":"ip-1.2.3.4","family":2}' -j | jq .name
"ip-1.2.3.4"
Do our kola tests on AWS catch this issue?
Looks like they don't because the node names are the IPv4 and not the hostname.
Upstream issue has been closed with: https://github.com/kubermatic/operating-system-manager/pull/479 - I will go ahead and close this issue as well.
For folks having the same issue, there are two solutions to mitigate this side effect from systemd 256 upgrade:
- Use
Environment=SYSTEMD_RESOLVED_SYNTHESIZE_HOSTNAME=falseinsystemd-resolved.service - Use the metadata from the provider in
/run/metadata/flatcar