Flatcar icon indicating copy to clipboard operation
Flatcar copied to clipboard

Flatcar: hostname -f Does Not Return FQDN, Causing Kubelet Identity Mismatch in Kubernetes

Open ealogar opened this issue 6 months ago • 4 comments

Description

On Flatcar Linux Flatcar-stable-4230.2.0-hvm and further alpha versions, the default hostname configuration does not return a fully qualified domain name (FQDN) when using hostname -f. Instead, it returns only the short hostname (e.g., ip-10-54-69-123), which leads to issues in Kubernetes where the kubelet expects to identify itself using the FQDN.

Impact

This causes the kubelet to register with a short hostname, while Kubernetes and related RBAC policies expect the fully qualified hostname, resulting in authentication errors and node registration failures.

Environment and steps to reproduce

1 . Set-up:

    Flatcar Linux (tested with stable channel)

    Running on AWS EC2 with the ami published yesterday 

    Kubelet launched with hostname configuration relying on hostname -f

    Using cloud-init or ignition to configure /etc/kubernetes/nodeip.conf
  1. Task: Start a Flatcar instance and run a Kubernetes kubelet.
Action(s):
a. Boot Flatcar node
b. Kubelet reads KUBELET_HOSTNAME=$(hostname -f)
c. Kubelet uses short hostname to authenticate (e.g., system:node:ip-10-54-69-123)
d. Kubernetes API expects FQDN (e.g., ip-10-54-69-123.<region>.xxx.zzzz)
e. Authentication fails

Error:
Failed to list *v1.Node: nodes "ip-10-54-69-123.<region>.xxx.zzzz" is forbidden: User "system:node:ip-10-54-69-123" cannot list resource "nodes"

Expected behavior

hostname -f should return the fully qualified domain name (e.g., ip-10-54-69-123..xxx.zzzz), ensuring kubelet authentication and identity match what the Kubernetes API server expects.

Additional information

As a workaround, explicitly setting the FQDN in KUBELET_HOSTNAME using cloud metadata (e.g., /run/metadata/coreos) resolves the issue.

ealogar avatar Jun 25 '25 21:06 ealogar

@tormath1 this in general is the same for all the providers. I tried this on other cloud providers as well and we get the short hostname instead of the FQDN hostname. Although, for Kubernetes AWS is the only one that i'm aware of that has a hard requirement for the hostname to be the FQDN hostname.

You might get a node that joins your cluster and is ready initially but the CCM will be unable to locate the the node in AWS and it'll have the taint node.cloudprovider.kubernetes.io/uninitialized making it not schedulable.

    taints:
    - effect: NoSchedule
      key: node.cloudprovider.kubernetes.io/uninitialized
      value: "true"
    - effect: NoSchedule
      key: node.kubernetes.io/unreachable

ahmedwaleedmalik avatar Jun 26 '25 08:06 ahmedwaleedmalik

Hello @ealogar and thanks for the feedback and the detailed issue. This is a side-effect of the systemd upgrade (from 255 to 256), you can get back to the normal behavior with this drop-in (manual intervention or via Butane/Ignition):

# /etc/systemd/system/systemd-resolved.service.d/override.conf
[Service]
Environment=SYSTEMD_RESOLVED_SYNTHESIZE_HOSTNAME=false

or

variant: flatcar
version: 1.1.0
systemd:
  units:
    - name: systemd-resolved.service
      dropins:
        - name: hostname-fqdn.conf
          contents: |
            [Service]
            Environment=SYSTEMD_RESOLVED_SYNTHESIZE_HOSTNAME=false

(ref: https://github.com/systemd/systemd/blob/a7bfb9f76b96888d60b4f287f29dcbf758ba34c0/docs/ENVIRONMENT.md?plain=1#L384-L386)

Let us know if if works.

tormath1 avatar Jun 26 '25 08:06 tormath1

@tormath1: systemd-resolved is not the default resolver on Flatcar is it? (https://www.flatcar.org/docs/latest/setup/customization/configuring-dns/)

Do our kola tests on AWS catch this issue?

jepio avatar Jun 26 '25 10:06 jepio

systemd-resolved is the default resolver according to the documentation:

By default, DNS resolution on Flatcar Container Linux is handled through /etc/resolv.conf, which is a symlink to /run/systemd/resolve/resolv.conf. This file is managed by systemd-resolved.

On AWS instance, strace of hostname -f returns:

connect(3, {sa_family=AF_UNIX, sun_path="/run/systemd/resolve/io.systemd.Resolve"}, 42) = 0
sendto(3, "{\"method\":\"io.systemd.Resolve.ResolveHostname\",\"parameters\":{\"name\":\"ip-1.2.3.4\",\"family\":2,\"flags\":0}}\0"

There is a call to the io.systemd.Resolve socket, and calling manually this method gives:

$ systemctl show -p Environment systemd-resolved
Environment=SYSTEMD_RESOLVED_SYNTHESIZE_HOSTNAME=false
$ varlinkctl call /run/systemd/resolve/io.systemd.Resolve io.systemd.Resolve.ResolveHostname '{"name":"ip-1.2.3.4","family":2}' -j | jq .name
"ip-1.2.3.4.eu-central-1.compute.internal"

Now, with the default boolean:

$ systemctl show -p Environment systemd-resolved
Environment=SYSTEMD_RESOLVED_SYNTHESIZE_HOSTNAME=true
$ varlinkctl call /run/systemd/resolve/io.systemd.Resolve io.systemd.Resolve.ResolveHostname '{"name":"ip-1.2.3.4","family":2}' -j | jq .name
"ip-1.2.3.4"

Do our kola tests on AWS catch this issue?

Looks like they don't because the node names are the IPv4 and not the hostname.

tormath1 avatar Jun 26 '25 13:06 tormath1

Upstream issue has been closed with: https://github.com/kubermatic/operating-system-manager/pull/479 - I will go ahead and close this issue as well.

For folks having the same issue, there are two solutions to mitigate this side effect from systemd 256 upgrade:

  • Use Environment=SYSTEMD_RESOLVED_SYNTHESIZE_HOSTNAME=false in systemd-resolved.service
  • Use the metadata from the provider in /run/metadata/flatcar

tormath1 avatar Jul 07 '25 07:07 tormath1