k0s icon indicating copy to clipboard operation
k0s copied to clipboard

Airgap deployment doesn't work

Open Its-Alex opened this issue 1 year ago • 6 comments

Before creating an issue, make sure you've checked the following:

  • [x] You are running the latest released version of k0s
  • [x] Make sure you've searched for existing issues, both open and closed
  • [x] Make sure you've searched for PRs too, a fix might've been merged already
  • [x] You're looking at docs for the released version, "main" branch docs are usually ahead of released versions.

Platform

Linux 5.4.0-144-generic #161-Ubuntu SMP Fri Feb 3 14:49:04 UTC 2023 x86_64 GNU/Linux NAME="Ubuntu" VERSION="20.04.5 LTS (Focal Fossa)" ID=ubuntu ID_LIKE=debian PRETTY_NAME="Ubuntu-Server 20.04.5 v2.0 LTS (Cubic 2023-01-10 09:05)" VERSION_ID="20.04" HOME_URL="https://www.ubuntu.com/" SUPPORT_URL="https://help.ubuntu.com/" BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/" PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy" VERSION_CODENAME=focal UBUNTU_CODENAME=focal

Version

v1.30.4+k0s.0

Sysinfo

`k0s sysinfo`
Total memory: 62.5 GiB (pass)
Disk space available for /var/lib/k0s: 1.5 TiB (pass)
Name resolution: localhost: [127.0.0.1] (pass)
Operating system: Linux (pass)
  Linux kernel release: 5.4.0-144-generic (pass)
  Max. file descriptors per process: current: 1048576 / max: 1048576 (pass)
  AppArmor: active (pass)
  Executable in PATH: modprobe: /usr/sbin/modprobe (pass)
  Executable in PATH: mount: /usr/bin/mount (pass)
  Executable in PATH: umount: /usr/bin/umount (pass)
  /proc file system: mounted (0x9fa0) (pass)
  Control Groups: version 1 (pass)
    cgroup controller "cpu": available (pass)
    cgroup controller "cpuacct": available (pass)
    cgroup controller "cpuset": available (pass)
    cgroup controller "memory": available (pass)
    cgroup controller "devices": available (pass)
    cgroup controller "freezer": available (pass)
    cgroup controller "pids": available (pass)
    cgroup controller "hugetlb": available (pass)
    cgroup controller "blkio": available (pass)
  CONFIG_CGROUPS: Control Group support: built-in (pass)
    CONFIG_CGROUP_FREEZER: Freezer cgroup subsystem: built-in (pass)
    CONFIG_CGROUP_PIDS: PIDs cgroup subsystem: built-in (pass)
    CONFIG_CGROUP_DEVICE: Device controller for cgroups: built-in (pass)
    CONFIG_CPUSETS: Cpuset support: built-in (pass)
    CONFIG_CGROUP_CPUACCT: Simple CPU accounting cgroup subsystem: built-in (pass)
    CONFIG_MEMCG: Memory Resource Controller for Control Groups: built-in (pass)
    CONFIG_CGROUP_HUGETLB: HugeTLB Resource Controller for Control Groups: built-in (pass)
    CONFIG_CGROUP_SCHED: Group CPU scheduler: built-in (pass)
      CONFIG_FAIR_GROUP_SCHED: Group scheduling for SCHED_OTHER: built-in (pass)
        CONFIG_CFS_BANDWIDTH: CPU bandwidth provisioning for FAIR_GROUP_SCHED: built-in (pass)
    CONFIG_BLK_CGROUP: Block IO controller: built-in (pass)
  CONFIG_NAMESPACES: Namespaces support: built-in (pass)
    CONFIG_UTS_NS: UTS namespace: built-in (pass)
    CONFIG_IPC_NS: IPC namespace: built-in (pass)
    CONFIG_PID_NS: PID namespace: built-in (pass)
    CONFIG_NET_NS: Network namespace: built-in (pass)
  CONFIG_NET: Networking support: built-in (pass)
    CONFIG_INET: TCP/IP networking: built-in (pass)
      CONFIG_IPV6: The IPv6 protocol: built-in (pass)
    CONFIG_NETFILTER: Network packet filtering framework (Netfilter): built-in (pass)
      CONFIG_NETFILTER_ADVANCED: Advanced netfilter configuration: built-in (pass)
      CONFIG_NF_CONNTRACK: Netfilter connection tracking support: module (pass)
      CONFIG_NETFILTER_XTABLES: Netfilter Xtables support: module (pass)
        CONFIG_NETFILTER_XT_TARGET_REDIRECT: REDIRECT target support: module (pass)
        CONFIG_NETFILTER_XT_MATCH_COMMENT: "comment" match support: module (pass)
        CONFIG_NETFILTER_XT_MARK: nfmark target and match support: module (pass)
        CONFIG_NETFILTER_XT_SET: set target and match support: module (pass)
        CONFIG_NETFILTER_XT_TARGET_MASQUERADE: MASQUERADE target support: module (pass)
        CONFIG_NETFILTER_XT_NAT: "SNAT and DNAT" targets support: module (pass)
        CONFIG_NETFILTER_XT_MATCH_ADDRTYPE: "addrtype" address type match support: module (pass)
        CONFIG_NETFILTER_XT_MATCH_CONNTRACK: "conntrack" connection tracking match support: module (pass)
        CONFIG_NETFILTER_XT_MATCH_MULTIPORT: "multiport" Multiple port match support: module (pass)
        CONFIG_NETFILTER_XT_MATCH_RECENT: "recent" match support: module (pass)
        CONFIG_NETFILTER_XT_MATCH_STATISTIC: "statistic" match support: module (pass)
      CONFIG_NETFILTER_NETLINK: module (pass)
      CONFIG_NF_NAT: module (pass)
      CONFIG_IP_SET: IP set support: module (pass)
        CONFIG_IP_SET_HASH_IP: hash:ip set support: module (pass)
        CONFIG_IP_SET_HASH_NET: hash:net set support: module (pass)
      CONFIG_IP_VS: IP virtual server support: module (pass)
        CONFIG_IP_VS_NFCT: Netfilter connection tracking: built-in (pass)
        CONFIG_IP_VS_SH: Source hashing scheduling: module (pass)
        CONFIG_IP_VS_RR: Round-robin scheduling: module (pass)
        CONFIG_IP_VS_WRR: Weighted round-robin scheduling: module (pass)
      CONFIG_NF_CONNTRACK_IPV4: IPv4 connetion tracking support (required for NAT): unknown (warning)
      CONFIG_NF_REJECT_IPV4: IPv4 packet rejection: module (pass)
      CONFIG_NF_NAT_IPV4: IPv4 NAT: unknown (warning)
      CONFIG_IP_NF_IPTABLES: IP tables support: module (pass)
        CONFIG_IP_NF_FILTER: Packet filtering: module (pass)
          CONFIG_IP_NF_TARGET_REJECT: REJECT target support: module (pass)
        CONFIG_IP_NF_NAT: iptables NAT support: module (pass)
        CONFIG_IP_NF_MANGLE: Packet mangling: module (pass)
      CONFIG_NF_DEFRAG_IPV4: module (pass)
      CONFIG_NF_CONNTRACK_IPV6: IPv6 connetion tracking support (required for NAT): unknown (warning)
      CONFIG_NF_NAT_IPV6: IPv6 NAT: unknown (warning)
      CONFIG_IP6_NF_IPTABLES: IP6 tables support: module (pass)
        CONFIG_IP6_NF_FILTER: Packet filtering: module (pass)
        CONFIG_IP6_NF_MANGLE: Packet mangling: module (pass)
        CONFIG_IP6_NF_NAT: ip6tables NAT support: module (pass)
      CONFIG_NF_DEFRAG_IPV6: module (pass)
    CONFIG_BRIDGE: 802.1d Ethernet Bridging: module (pass)
      CONFIG_LLC: module (pass)
      CONFIG_STP: module (pass)
  CONFIG_EXT4_FS: The Extended 4 (ext4) filesystem: built-in (pass)
  CONFIG_PROC_FS: /proc file system support: built-in (pass)

What happened?

On new installation of v1.30.4+k0s.0 using airgap bundle but I get an error at cluster startup:

$ k0s kubectl get pods -Aw
NAMESPACE     NAME                                       READY   STATUS              RESTARTS   AGE
kube-system   calico-kube-controllers-688fb7db9f-nfjkm   0/1     Pending             0          36m
kube-system   calico-node-p9v5d                          0/1     Init:0/1            0          36m
kube-system   coredns-74f779ff84-g8vsp                   0/1     Pending             0          36m
kube-system   kube-proxy-69nlm                           0/1     ContainerCreating   0          36m
kube-system   metrics-server-5cc4f44b94-nwb7f            0/1     Pending             0          36m
$ k0s kubectl -n kube-system describe pod calico-node-p9v5d
...
Events:
  Type     Reason                  Age                   From               Message
  ----     ------                  ----                  ----               -------
  Normal   Scheduled               35m                   default-scheduler  Successfully assigned kube-system/calico-node-p9v5d to platform-airgap-ci-01
  Warning  DNSConfigForming        15m (x29 over 35m)    kubelet            Nameserver limits were exceeded, some nameservers have been omitted, the applied nameserver line is: 8.8.8.8 8.8.4.4 192.168.90.251
  Warning  FailedCreatePodSandBox  5m13s (x24 over 33m)  kubelet            Failed to create pod sandbox: rpc error: code = DeadlineExceeded desc = failed to get sandbox image "registry.k8s.io/pause:3.8": failed to pull image "registry.k8s.io/pause:3.8": failed to pull and unpack image "registry.k8s.io/pause:3.8": failed to resolve reference "registry.k8s.io/pause:3.8": failed to do request: Head "https://registry.k8s.io/v2/pause/manifests/3.8": dial tcp 34.96.108.209:443: i/o timeout
  Warning  FailedCreatePodSandBox  11s (x24 over 35m)    kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = failed to get sandbox image "registry.k8s.io/pause:3.8": failed to pull image "registry.k8s.io/pause:3.8": failed to pull and unpack image "registry.k8s.io/pause:3.8": failed to resolve reference "registry.k8s.io/pause:3.8": failed to do request: Head "https://registry.k8s.io/v2/pause/manifests/3.8": dial tcp 34.96.108.209:443: i/o timeout

Container registry.k8s.io/pause:3.8 is not present on airgap bundle from release:

$ cat index.json | jq
{
  "schemaVersion": 2,
  "manifests": [
    ...
    {
      "mediaType": "application/vnd.docker.distribution.manifest.list.v2+json",
      "digest": "sha256:7031c1b283388d2c2e09b57badb803c05ebed362dc88d84b480cc47f72a21097",
      "size": 2405,
      "annotations": {
        "io.containerd.image.name": "registry.k8s.io/pause:3.9",
        "org.opencontainers.image.ref.name": "3.9"
      }
    },
    ...
  ]
}

Steps to reproduce

  1. Perform an airgap installation with the airgap bundle available v1.30.4+k0s.0 in single node mode

Expected behavior

Container registry.k8s.io/pause:3.8 should be present in airgap-images-list.txt of version v1.30.4+k0s.0 on releases.

Actual behavior

Container registry.k8s.io/pause:3.8 is not present in airgap-images-list.txt of version v1.30.4+k0s.0 on releases.

Screenshots and logs

No response

Additional context

No response

Its-Alex avatar Oct 23 '24 09:10 Its-Alex

I made an update from v1.23.17+k0s.0 to v1.30.4+k0s.0

Wow, that's brave. Updates are only supported from minor version to minor version, basically along the lines of what the Kubernetes Version Skew Policy mandates.

Container registry.k8s.io/pause:3.8 should be present in airgap-images-list.txt of version v1.30.4+k0s.0 on releases.

I think you may be suffering from hardcoded image versions in the k0s configuration. Can you check if you've explicitly specified the images in the k0s config? Unless you're using a private registry or something, it's usually best not to specify them at all. For context, older k0s releases used to include the images in the generated default configuration, but stopped doing so because of exactly the problem you're having here.

twz123 avatar Oct 23 '24 09:10 twz123

@twz123

Wow, that's brave. Updates are only supported from minor version to minor version, basically along the lines of what the Kubernetes Version Skew Policy mandates.

The error happend on a new installation instead of upgrade, didn't go all the way to do so, sorry for my inaccuracy (I edited the comment)

I think you may be suffering from hardcoded image versions in the k0s configuration. Can you check if you've explicitly specified the images in the k0s config? Unless you're using a private registry or something, it's usually best not to specify them at all. For context, older k0s releases used to include the images in the generated default configuration, but https://github.com/k0sproject/k0s/issues/2587 because of exactly the problem you're having here.

I don't explicitly specify the images, so I use default k0s version

Its-Alex avatar Oct 23 '24 09:10 Its-Alex

K0s configures the pause image to be used by containerd according to the k0s configuration (which defaults to version 3.9 currently). So if containerd is not picking that up, there has to be something off with the containerd configuration. Can you check your containerd configuration files? Can you also check the contents of /run/k0s/containerd-cri.toml?

twz123 avatar Oct 23 '24 10:10 twz123

@twz123 For containerd config:

$ cat /run/k0s/containerd-cri.toml
cat: /run/k0s/containerd-cri.toml: No such file or directory
$ cat /etc/k0s/containerd.toml
# This is the configuration for k0s managed containerD.
# For reference see https://github.com/containerd/containerd/blob/main/docs/man/containerd-config.toml.5.md
version = 2

[plugins]
  [plugins."io.containerd.grpc.v1.cri"]
    [plugins."io.containerd.grpc.v1.cri".containerd]
      [plugins."io.containerd.grpc.v1.cri".containerd.runtimes]
        [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc]
          runtime_type = "io.containerd.runc.v2"

This is our k0sctl configuration, if you need more informations:

apiVersion: k0sctl.k0sproject.io/v1beta1
kind: Cluster
metadata:
  name: custom-cluster
spec:
  hosts:
  - localhost:
      enabled: true
    role: single
    uploadBinary: true
    k0sBinaryPath: /opt/custom/core/bin/k0s
    installFlags:
    - --profile=custom-k0s-profile
    files:
    - name: k0s-bundle
      src: /opt/custom/core/images/k0s-airgap-bundle.tar
      dstDir: /var/lib/k0s/images/
      dst: ""
      perm: "0755"
      dirPerm: null
      user: ""
      group: ""
    - name: containerd-config
      src: /opt/custom/core/containerd.toml
      dstDir: /etc/k0s/
      dst: ""
      perm: "0755"
      dirPerm: null
      user: ""
      group: ""
  k0s:
    version: v1.30.4+k0s.0
    config:
      spec:
        api:
          extraArgs:
            feature-gates: HPAScaleToZero=true
        controllerManager:
          extraArgs:
            horizontal-pod-autoscaler-tolerance: "0.001"
        images:
          default_pull_policy: Never
        network:
          provider: calico
        telemetry:
          enabled: false
        workerProfiles:
        - name: custom-k0s-profile
          values:
            imageGCHighThresholdPercent: 100
            imageMinimumGCAge: 876000h
            maxPods: 200
        - name: custom-k0s-profile-with-cpu-optimization
          values:
            cpuManagerPolicy: static
            cpuManagerPolicyOptions:
              full-pcpus-only: "true"
            imageGCHighThresholdPercent: 100
            imageMinimumGCAge: 876000h
            maxPods: 200
            systemReserved:
              cpu: "4"
              memory: 1Gi

And the worker configuration:

cat /var/lib/k0s/worker-profile.yaml
data:
  apiServerAddresses: '["192.168.30.215:6443"]'
  konnectivity: '{"agentPort":8132}'
  kubeletConfiguration: '{"kind":"KubeletConfiguration","apiVersion":"kubelet.config.k8s.io/v1beta1","syncFrequency":"0s","fileCheckFrequency":"0s","httpCheckFrequency":"0s","tlsCipherSuites":["TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256","TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384","TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305_SHA256","TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256","TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384","TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305_SHA256"],"tlsMinVersion":"VersionTLS12","rotateCertificates":true,"serverTLSBootstrap":true,"authentication":{"x509":{},"webhook":{"cacheTTL":"0s"},"anonymous":{}},"authorization":{"webhook":{"cacheAuthorizedTTL":"0s","cacheUnauthorizedTTL":"0s"}},"eventRecordQPS":0,"clusterDomain":"cluster.local","clusterDNS":["10.96.0.10"],"streamingConnectionIdleTimeout":"0s","nodeStatusUpdateFrequency":"0s","nodeStatusReportFrequency":"0s","imageMinimumGCAge":"876000h0m0s","imageMaximumGCAge":"0s","imageGCHighThresholdPercent":100,"volumeStatsAggPeriod":"0s","cpuManagerReconcilePeriod":"0s","runtimeRequestTimeout":"0s","maxPods":200,"evictionPressureTransitionPeriod":"0s","failSwapOn":false,"memorySwap":{},"logging":{"flushFrequency":0,"verbosity":0,"options":{"text":{"infoBufferSize":"0"},"json":{"infoBufferSize":"0"}}},"shutdownGracePeriod":"0s","shutdownGracePeriodCriticalPods":"0s","containerRuntimeEndpoint":""}'
  nodeLocalLoadBalancing: '{"type":"EnvoyProxy","envoyProxy":{"image":{"image":"quay.io/k0sproject/envoy-distroless","version":"v1.30.4"},"imagePullPolicy":"Never","apiServerBindPort":7443,"konnectivityServerBindPort":7132}}'
  pauseImage: '{"image":"registry.k8s.io/pause","version":"3.9"}'
name: custom-k0s-profile

Its-Alex avatar Oct 23 '24 10:10 Its-Alex

Can you delete the /etc/k0s/containerd.toml file and try again? It's an old one. The current version should have a header like this. I suppose it didn't get updated because you skipped several minor versions during the upgrade, and hence the code that did the migration to the newer versions wasn't executed.

twz123 avatar Oct 23 '24 11:10 twz123

@twz123 Okay I think I understand what's wrong, I overwrite containerd configuration to add credentials to my registry. There is a way to do this directly in k0s, I will try it and tell you if it works. Thanks you very much for your time!

Its-Alex avatar Oct 23 '24 11:10 Its-Alex

@twz123 I can confirm that using predefined containerd configuration, everythings works. Sorry for disturbing you and thank you very much for your help

Its-Alex avatar Nov 05 '24 13:11 Its-Alex

Hi, how did you manage to get your airgap install working with the registry config? I'm running into a similar issue where when I apply my containerd.toml and restart k0s OR just deploy with my modified config it gets stuck failing to pull any of the images and/or pause.

sorry forgot to @Its-Alex

killergoalie avatar Jan 14 '25 22:01 killergoalie

@killergoalie I fixed it by using /etc/k0s/containerd.d/ instead of completly erase default containerd configuration as documented:

As of 1.27.1, k0s allows dynamic configuration of containerd CRI runtimes. This works by k0s creating a special directory in /etc/k0s/containerd.d/ where users can place partial containerd configuration files.

Its-Alex avatar Jan 16 '25 21:01 Its-Alex

@Its-Alex thanks, last question are you using the wildcard with your registry config? Or explicitly calling out every registry? I'm trying to wrap my head around the V1/V2/V3 containerd config formats.

killergoalie avatar Jan 21 '25 17:01 killergoalie

@killergoalie Not really sure what's your problem. I use somethings similar of this article. If I need to configure more than one registry I use one config per registry.

Its-Alex avatar Jan 22 '25 13:01 Its-Alex

@Its-Alex I think my issue is I'm trying to use the older V1 wild card as you can find here: https://github.com/containerd/cri/blob/release/1.4/docs/registry.md#configure-registry-endpoint
I think I found a path forward, the end result is I didn't want to create configs for each mirror, just have a blanket for all of them. Hence the old config worked.

I did have a followup on the secrets are you using this for the registry auth? https://kubernetes.io/docs/reference/kubectl/generated/kubectl_create/kubectl_create_secret_docker-registry/

killergoalie avatar Jan 22 '25 15:01 killergoalie

@killergoalie Sorry, but I lack a clear understanding of your problem. From what I can gather:

@Its-Alex I think my issue is that I'm trying to use the older V1 wildcard, as you can find here: https://github.com/containerd/cri/blob/release/1.4/docs/registry.md#configure-registry-endpoint

It seems you're trying to configure a proxy URL for multiple registries, not the registry itself here.

I did have a follow-up question about the secrets. Are you using this for registry authentication? https://kubernetes.io/docs/reference/kubectl/generated/kubectl_create/kubectl_create_secret_docker-registry/

There are many ways to handle registry authentication in Kubernetes, either directly in CRI or in Kubernetes (as explained in the documentation you shared).

I think this issue is not related to your problem, can you use Stack overflow, or another issue to resolve this problem?

Its-Alex avatar Jan 23 '25 16:01 Its-Alex

@Its-Alex Thanks for your time, agreed I think these are not related. Will open an issue of my own. Again thanks for the time.

killergoalie avatar Jan 23 '25 17:01 killergoalie