kubespray icon indicating copy to clipboard operation
kubespray copied to clipboard

usr/local/bin/nerdctl: not found when running kubespray with vagrant

Open romch007 opened this issue 1 year ago • 23 comments

I am trying to install kubespray using the provided Vagrantfile. The only changes I made were :

 $num_instances ||= 3
 $instance_name_prefix ||= "k8s"
 $vm_gui ||= false
-$vm_memory ||= 2048
-$vm_cpus ||= 2
+$vm_memory ||= 4096
+$vm_cpus ||= 3
 $shared_folders ||= {}
 $forwarded_ports ||= {}
-$subnet ||= "172.18.8"
+$subnet ||= "192.168.56"
 $subnet_ipv6 ||= "fd3c:b398:0698:0756"
 $os ||= "ubuntu2004"
 $network_plugin ||= "flannel"
@@ -254,6 +254,7 @@ Vagrant.configure("2") do |config|
       # And limit the action to gathering facts, the full playbook is going to be ran by testcases_run.sh
       if i == $num_instances
         node.vm.provision "ansible" do |ansible|
+          ansible.compatibility_mode = "2.0"
           ansible.playbook = $playbook
           ansible.verbose = $ansible_verbosity
           $ansible_inventory_path = File.join( $inventory, "hosts.ini")

All the other files of the repo are unchanged.

Environment:

  • Cloud provider or hardware configuration: VirtualBox 7.0.8

  • OS (printf "$(uname -srm)\n$(cat /etc/os-release)\n"):

Linux 6.3.9-arch1-1 x86_64
NAME="Arch Linux"
PRETTY_NAME="Arch Linux"
ID=arch
BUILD_ID=rolling
ANSI_COLOR="38;2;23;147;209"
HOME_URL="https://archlinux.org/"
DOCUMENTATION_URL="https://wiki.archlinux.org/"
SUPPORT_URL="https://bbs.archlinux.org/"
BUG_REPORT_URL="https://bugs.archlinux.org/"
PRIVACY_POLICY_URL="https://terms.archlinux.org/docs/privacy-policy/"
LOGO=archlinux-logo
  • Version of Ansible (ansible --version):
ansible [core 2.15.1]
  config file = /etc/ansible/ansible.cfg
  configured module search path = ['/home/romain/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules']
  ansible python module location = /usr/lib/python3.11/site-packages/ansible
  ansible collection location = /home/romain/.ansible/collections:/usr/share/ansible/collections
  executable location = /bin/ansible
  python version = 3.11.3 (main, Jun  5 2023, 09:32:32) [GCC 13.1.1 20230429] (/usr/bin/python)
  jinja version = 3.1.2
  libyaml = True
  • Version of Python (python --version): Python 3.11.3

Kubespray version (commit) (git rev-parse --short HEAD): b42757d33

Network plugin used: flannel

Full inventory with variables (ansible -i inventory/sample/inventory.ini all -m debug -a "var=hostvars[inventory_hostname]"):

Command used to invoke ansible: vagrant up

Output of ansible run:

On every node:

TASK [download : download_container | Load image into the local container registry]
fatal: [k8s-1]: FAILED! => {"changed": true, "cmd": "/usr/local/bin/nerdctl -n k8s.io image load < /tmp/releases/images/docker.io_flannel_flannel_v0.22.0.tar", "delta": "0:00:00.004128", "end": "2023-06-30 16:34:15.315823", "failed_when_result": true, "msg": "non-zero return code", "rc": 127, "start": "2023-06-30 16:34:15.311695", "stderr": "/bin/sh: 1: /usr/local/bin/nerdctl: not found", "stderr_lines": ["/bin/sh: 1: /usr/local/bin/nerdctl: not found"], "stdout": "", "stdout_lines": []}
fatal: [k8s-2]: same
fatal: [k8s-3]: same

Anything else do we need to know:

romch007 avatar Jun 30 '23 16:06 romch007

I'm seeing the same behavior with Debian 12 VMs hosted by proxmox. Kubespray downloads nerdctl (among others) correctly to /tmp/releases then doesn't copy to /usr/local/bin - skips right to trying to pull the flannel image with nertctl and gets a [Errno 2] No such file or directory: b'/usr/local/bin/nerdctl

The full traceback is:
  File "/tmp/ansible_ansible.legacy.command_payload_uw5dw5yf/ansible_ansible.legacy.command_payload.zip/ansible/module_utils/basic.py", line 2030, in run_command
    cmd = subprocess.Popen(args, **kwargs)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/subprocess.py", line 1024, in __init__
    self._execute_child(args, executable, preexec_fn, close_fds,
  File "/usr/lib/python3.11/subprocess.py", line 1901, in _execute_child
    raise child_exception_type(errno_num, err_msg, err_filename)
fatal: [node1]: FAILED! => {
    "attempts": 4,
    "changed": false,
    "cmd": "/usr/local/bin/nerdctl -n k8s.io pull --quiet docker.io/flannel/flannel:v0.22.0",
    "invocation": {
        "module_args": {
            "_raw_params": "/usr/local/bin/nerdctl -n k8s.io pull --quiet  docker.io/flannel/flannel:v0.22.0",
            "_uses_shell": false,
            "argv": null,
            "chdir": null,
            "creates": null,
            "executable": null,
            "removes": null,
            "stdin": null,
            "stdin_add_newline": true,
            "strip_empty_ends": true
        }
    },
    "msg": "[Errno 2] No such file or directory: b'/usr/local/bin/nerdctl'",
    "rc": 2,
    "stderr": "",
    "stderr_lines": [],
    "stdout": "",
    "stdout_lines": []

wolskies avatar Jul 02 '23 20:07 wolskies

Same errors with Debian 11.7 , Vagrant 2.3.7 and Virtualbox 7.0.8.

mickaelmonsieur avatar Jul 03 '23 15:07 mickaelmonsieur

Small fix:

cp /tmp/releases/nerdctl /usr/local/bin/nerdctl && cp /tmp/releases/crictl /usr/local/bin/crictl

and relaunch ansible.

mickaelmonsieur avatar Jul 03 '23 20:07 mickaelmonsieur

Thanks @romch007 @mickaelmonsieur

If glad to, feel free to provide a PR. :-)

Thank you very much.

yankay avatar Jul 04 '23 09:07 yankay

I did that, it gets past the immediate problem with nerdctl not being in /usr/local/bin, but fails later trying to create the kubeadm token (on all nodes) - I think it's related (seems like nerdctl, crictl and runc get downloaded but not configured):

TASK [kubernetes/control-plane : Create kubeadm token for joining nodes with 24h expiration (default)] ****************************************** task path: /Users/ed/Kube/kubespray/roles/kubernetes/control-plane/tasks/kubeadm-setup.yml:207 fatal: [node2 -> node1(192.168.1.73)]: FAILED! => { "attempts": 5, "changed": false, "cmd": [ "/usr/local/bin/kubeadm", "--kubeconfig", "/etc/kubernetes/admin.conf", "token", "create" ], "delta": "0:01:15.109430", "end": "2023-07-04 02:54:04.922118", "invocation": { "module_args": { "_raw_params": "/usr/local/bin/kubeadm --kubeconfig /etc/kubernetes/admin.conf token create", "_uses_shell": false, "argv": null, "chdir": null, "creates": null, "executable": null, "removes": null, "stdin": null, "stdin_add_newline": true, "strip_empty_ends": true } }, "msg": "non-zero return code", "rc": 1, "start": "2023-07-04 02:52:49.812688", "stderr": "timed out waiting for the condition\nTo see the stack trace of this error execute with --v=5 or higher", "stderr_lines": [ "timed out waiting for the condition", "To see the stack trace of this error execute with --v=5 or higher" ], "stdout": "", "stdout_lines": []

journalctl shows something wrong with the configuration of runc:

`sudo journalctl -xeu kubelet | grep failed Jul 04 16:24:50 node1 kubelet[127545]: E0704 16:24:50.524302 127545 remote_runtime.go:176] "RunPodSandbox from runtime service failed" err="rpc error: code = Unknown desc = failed to create containerd task: failed to create shim task: OCI runtime create failed: unable to retrieve OCI runtime error (open /run/containerd/io.containerd.runtime.v2.task/k8s.io/363edeefd37098196f7b4bd3baa2253e932f3501bdd97b083d0c8fceba6138e7/log.json: no such file or directory): exec: "runc": executable file not found in $PATH: unknown" Jul 04 16:24:50 node1 kubelet[127545]: E0704 16:24:50.524363 127545 kuberuntime_sandbox.go:72] "Failed to create sandbox for pod" err="rpc error: code = Unknown desc = failed to create containerd task: failed to create shim task: OCI runtime create failed: unable to retrieve OCI runtime error (open /run/containerd/io.containerd.runtime.v2.task/k8s.io/363edeefd37098196f7b4bd3baa2253e932f3501bdd97b083d0c8fceba6138e7/log.json: no such file or directory): exec: "runc": executable file not found in $PATH: unknown" pod="kube-system/kube-apiserver-node1" Jul 04 16:24:50 node1 kubelet[127545]: E0704 16:24:50.524386 127545 kuberuntime_manager.go:782] "CreatePodSandbox for pod failed" err="rpc error: code = Unknown desc = failed to create containerd task: failed to create shim task: OCI runtime create failed: unable to retrieve OCI runtime error (open /run/containerd/io.containerd.runtime.v2.task/k8s.io/363edeefd37098196f7b4bd3baa2253e932f3501bdd97b083d0c8fceba6138e7/log.json: no such file or directory): exec: "runc": executable file not found in $PATH: unknown" pod="kube-system/kube-apiserver-node1" Jul 04 16:24:50 node1 kubelet[127545]: E0704 16:24:50.524432 127545 pod_workers.go:965] "Error syncing pod, skipping" err="failed to "CreatePodSandbox" for "kube-apiserver-node1_kube-system(c4b89dde2a5c1b5d448fe0f03d05baa8)" with CreatePodSandboxError: "Failed to create sandbox for pod \"kube-apiserver-node1_kube-system(c4b89dde2a5c1b5d448fe0f03d05baa8)\": rpc error: code = Unknown desc = failed to create containerd task: failed to create shim task: OCI runtime create failed: unable to retrieve OCI runtime error (open /run/containerd/io.containerd.runtime.v2.task/k8s.io/363edeefd37098196f7b4bd3baa2253e932f3501bdd97b083d0c8fceba6138e7/log.json: no such file or directory): exec: \"runc\": executable file not found in $PATH: unknown"" pod="kube-system/kube-apiserver-node1" podUID=c4b89dde2a5c1b5d448fe0f03d05baa8 Jul 04 16:24:50 node1 kubelet[127545]: E0704 16:24:50.619838 127545 controller.go:146] failed to ensure lease exists, will retry in 7s, error: Get "https://192.168.1.73:6443/apis/coordination.k8s.io/v1/namespaces/kube-node-lease/leases/node1?timeout=10s": dial tcp 192.168.1.73:6443: connect: connection refused Jul 04 16:24:52 node1 kubelet[127545]: W0704 16:24:52.667772 127545 reflector.go:424] vendor/k8s.io/client-go/informers/factory.go:150: failed to list *v1.RuntimeClass: Get "https://192.168.1.73:6443/apis/node.k8s.io/v1/runtimeclasses?limit=500&resourceVersion=0": dial tcp 192.168.1.73:6443: connect: connection refused Jul 04 16:24:52 node1 kubelet[127545]: E0704 16:24:52.667836 127545 reflector.go:140] vendor/k8s.io/client-go/informers/factory.go:150: Failed to watch *v1.RuntimeClass: failed to list *v1.RuntimeClass: Get "https://192.168.1.73:6443/apis/node.k8s.io/v1/runtimeclasses?limit=500&resourceVersion=0": dial tcp 192.168.1.73:6443: connect: connection refused Jul 04 16:24:52 node1 kubelet[127545]: W0704 16:24:52.675521 127545 reflector.go:424] vendor/k8s.io/client-go/informers/factory.go:150: failed to list *v1.CSIDriver: Get "https://192.168.1.73:6443/apis/storage.k8s.io/v1/csidrivers?limit=500&resourceVersion=0": dial tcp 192.168.1.73:6443: connect: connection refused Jul 04 16:24:52 node1 kubelet[127545]: E0704 16:24:52.675585 127545 reflector.go:140] vendor/k8s.io/client-go/informers/factory.go:150: Failed to watch *v1.CSIDriver: failed to list *v1.CSIDriver: Get "https://192.168.1.73:6443/apis/storage.k8s.io/v1/csidrivers?limit=500&resourceVersion=0": dial tcp 192.168.1.73:6443: connect: connection refused Jul 04 16:24:53 node1 kubelet[127545]: E0704 16:24:53.520427 127545 remote_runtime.go:176] "RunPodSandbox from runtime service failed" err="rpc error: code = Unknown desc = failed to create containerd task: failed to create shim task: OCI runtime create failed: unable to retrieve OCI runtime error (open /run/containerd/io.containerd.runtime.v2.task/k8s.io/d4fb1e974177c6372785c1b4a8e242e55516580b9309a1407fc470f106387820/log.json: no such file or directory): exec: "runc": executable file not found in $PATH: unknown" Jul 04 16:24:53 node1 kubelet[127545]: E0704 16:24:53.520464 127545 kuberuntime_sandbox.go:72] "Failed to create sandbox for pod" err="rpc error: code = Unknown desc = failed to create containerd task: failed to create shim task: OCI runtime create failed: unable to retrieve OCI runtime error (open /run/containerd/io.containerd.runtime.v2.task/k8s.io/d4fb1e974177c6372785c1b4a8e242e55516580b9309a1407fc470f106387820/log.json: no such file or directory): exec: "runc": executable file not found in $PATH: unknown" pod="kube-system/kube-controller-manager-node1" Jul 04 16:24:53 node1 kubelet[127545]: E0704 16:24:53.520486 127545 kuberuntime_manager.go:782] "CreatePodSandbox for pod failed" err="rpc error: code = Unknown desc = failed to create containerd task: failed to create shim task: OCI runtime create failed: unable to retrieve OCI runtime error (open /run/containerd/io.containerd.runtime.v2.task/k8s.io/d4fb1e974177c6372785c1b4a8e242e55516580b9309a1407fc470f106387820/log.json: no such file or directory): exec: "runc": executable file not found in $PATH: unknown" pod="kube-system/kube-controller-manager-node1" Jul 04 16:24:53 node1 kubelet[127545]: E0704 16:24:53.520525 127545 pod_workers.go:965] "Error syncing pod, skipping" err="failed to "CreatePodSandbox" for "kube-controller-manager-node1_kube-system(84983840101f64a28c6328ab55dc5c58)" with CreatePodSandboxError: "Failed to create sandbox for pod \"kube-controller-manager-node1_kube-system(84983840101f64a28c6328ab55dc5c58)\": rpc error: code = Unknown desc = failed to create containerd task: failed to create shim task: OCI runtime create failed: unable to retrieve OCI runtime error (open /run/containerd/io.containerd.runtime.v2.task/k8s.io/d4fb1e974177c6372785c1b4a8e242e55516580b9309a1407fc470f106387820/log.json: no such file or directory): exec: \"runc\": executable file not found in $PATH: unknown"" pod="kube-system/kube-controller-manager-node1" podUID=84983840101f64a28c6328ab55dc5c58 Jul 04 16:24:57 node1 kubelet[127545]: E0704 16:24:57.620471 127545 controller.go:146] failed to ensure lease exists, will retry in 7s, error: Get "https://192.168.1.73:6443/apis/coordination.k8s.io/v1/namespaces/kube-node-lease/leases/node1?timeout=10s": dial tcp 192.168.1.73:6443: connect: connection refused

My guess is it's related to the missing nerdctl thing - playbook seems to skip over configuration for nerdctl, crictl and possibly runc

wolskies avatar Jul 04 '23 16:07 wolskies

Just to be clear, this is a total show-stopper for any use many (i'd argue most) people will have using on-premise kubespray at present. I'm trying to use this to build a cluster with calico on ubuntu; quite vanilla, really... How are no regression tests covering this? I've spent hours trying to figure out how those steps are being "skipped" and from what i can tell, it's not that they're skipped, it's that the configurations happen well later...

blackmesa-peterdohm avatar Jul 11 '23 01:07 blackmesa-peterdohm

Just to be clear, this is a total show-stopper for any use many (i'd argue most) people will have using on-premise kubespray at present. I'm trying to use this to build a cluster with calico on ubuntu; quite vanilla, really... How are no regression tests covering this? I've spent hours trying to figure out how those steps are being "skipped" and from what i can tell, it's not that they're skipped, it's that the configurations happen well later...

FALSE ALARM. I'd run ansible outside the virtualenvironment. So, this is a very curious failure mode that occurs if you do what i just did, in case anyone else runs into this....

blackmesa-peterdohm avatar Jul 11 '23 01:07 blackmesa-peterdohm

Got same error with master branch and debian 12

slappyslap avatar Jul 23 '23 11:07 slappyslap

From my perspective, it isn't a false alarm. I ran Ansible per the installation instructions, from inside the VENV and it continues to fail to configure nerdct/etc. I've tried with Debian 12 & Oracle/Rocky and get the same behavior - both on "bare metal" and VMs.

wolskies avatar Jul 23 '23 21:07 wolskies

Same on debian 11 with master branch

Le dim. 23 juil. 2023 à 23:42, Ed W. @.***> a écrit :

From my perspective, it isn't a false alarm. I ran Ansible per the installation instructions, from inside the VENV and it continues to fail to configure nerdct/etc. I've tried with Debian 12 & Oracle/Rocky and get the same behavior - both on "bare metal" and VMs.

— Reply to this email directly, view it on GitHub https://github.com/kubernetes-sigs/kubespray/issues/10268#issuecomment-1646966366, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADTZEPZPLLVQMXP52QC6FKLXRWLD7ANCNFSM6AAAAAAZ2CSX6M . You are receiving this because you commented.Message ID: @.***>

slappyslap avatar Jul 24 '23 14:07 slappyslap

get the same error on ubuntu 20.04.

Khodesaeed avatar Jul 25 '23 06:07 Khodesaeed

Faced similar problem with VBox Quick fix that helped in my case:

- name: Configure hosts
  gather_facts: False
  hosts: k8s_cluster
  tasks:
    - name: Create a symbolic link
      ansible.builtin.file:
        src: /tmp/releases/crictl
        dest: /usr/local/bin/crictl
        state: link
        force: true

    - name: Create a symbolic link
      ansible.builtin.file:
        src: /tmp/releases/nerdctl
        dest: /usr/local/bin/nerdctl
        state: link
        force: true

    - name: Create a symbolic link
      ansible.builtin.file:
        src: /tmp/releases/runc-v1.1.7.amd64
        dest: /usr/local/bin/runc
        state: link
        force: true

Just add this to playbooks/cluster.yml

Somehow, Kubespray doesn't copy nerdctl, crictl and runc to /usr/local/bin. So i just make a soft link

Mishavint avatar Jul 28 '23 10:07 Mishavint

After some investigation, I guess somehow the dependency roles of container-engine which is containerd don't run after containerd CRI selects. According to the Ansible documentation about role dependencies (link): Role dependencies let you automatically pull in other roles when using a role.
And the doc says: Ansible always executes roles listed in dependencies before the role that lists them.
Moreover, you can find the containerd or any other CRI-related role dependencies on this path: roles/container-engine/meta/main.yml
which is the code snippet below related to containerd:

---
dependencies:
...
  - role: container-engine/containerd
    when:
      - container_manager == 'containerd'
    tags:
      - container-engine
      - containerd

Following the same pattern, this role has some role dependencies and at this point the runc, crictl, and, nerdctl related tasks must run, but they didn't: the role-dependency meta file on path roles/container-engine/containerd/meta/main.yml:

---
dependencies:
  - role: container-engine/containerd-common
  - role: container-engine/runc
  - role: container-engine/crictl
  - role: container-engine/nerdctl

So, here is my Quick fix:
I added the required task for installing containerd on role-dependency meta file before the containerd section on this path roles/container-engine/meta/main.yml:

---
dependencies:
...
  - role: container-engine/runc
    when:
      - container_manager == 'containerd'

  - role: container-engine/nerdctl
    when:
      - container_manager == 'containerd'
  
  - role: container-engine/crictl
    when:
      - container_manager == 'containerd'

  - role: container-engine/containerd
    when:
      - container_manager == 'containerd'
    tags:
      - container-engine
      - containerd

P.S After some more investigation I found another Bug that I think that's my main issue and it was that after using the reset.yml playbook to reset the cluster some container process still remains and after killing those containers finally accomplished to deploy my cluster with Kubespray.

Khodesaeed avatar Jul 29 '23 07:07 Khodesaeed

After some investigation, I guess somehow the dependency roles of container-engine which is containerd don't run after containerd CRI selects. According to the Ansible documentation about role dependencies (link): Role dependencies let you automatically pull in other roles when using a role. And the doc says: Ansible always executes roles listed in dependencies before the role that lists them. Moreover, you can find the containerd or any other CRI-related role dependencies on this path: roles/container-engine/meta/main.yml which is the code snippet below related to containerd:

---
dependencies:
...
  - role: container-engine/containerd
    when:
      - container_manager == 'containerd'
    tags:
      - container-engine
      - containerd

Following the same pattern, this role has some role dependencies and at this point the runc, crictl, and, nerdctl related tasks must run, but they didn't: the role-dependency meta file on path roles/container-engine/containerd/meta/main.yml:

---
dependencies:
  - role: container-engine/containerd-common
  - role: container-engine/runc
  - role: container-engine/crictl
  - role: container-engine/nerdctl

So, here is my Quick fix: I added the required task for installing containerd on role-dependency meta file before the containerd section on this path roles/container-engine/meta/main.yml:

---
dependencies:
...
  - role: container-engine/runc
    when:
      - container_manager == 'containerd'

  - role: container-engine/nerdctl
    when:
      - container_manager == 'containerd'
  
  - role: container-engine/crictl
    when:
      - container_manager == 'containerd'

  - role: container-engine/containerd
    when:
      - container_manager == 'containerd'
    tags:
      - container-engine
      - containerd

P.S After some more investigation I found another Bug that I think that's my main issue and it was that after using the reset.yml playbook to reset the cluster some container process still remains and after killing those containers finally accomplished to deploy my cluster with Kubespray.

Thanks @Khodesaeed @roboticsbrian

I cannot find the root cause of the issue. Would you help us reproduce the issue?

Which config file, kubespray commit and OS are used, and is there any important step to reproduce the issue?

yankay avatar Aug 16 '23 03:08 yankay

Hi,

After some investigation, it could be linked with how dependencies work. It's not uniform across all Ansible version when using when. These Ansible issues could be relevant:

  • https://github.com/ansible/ansible/issues/81486
  • https://github.com/ansible/ansible/issues/81040

this is normal and expected behavior for meta dependencies, de duplication is done on the 'call signature' of the role itself. If you want finer grained control I would recommend using include_role instead.

I've started replacing all dependencies by include_roles and import_roles to avoid this. I can do a PR if you think this is a right approch @yankay.

RomainMou avatar Aug 16 '23 08:08 RomainMou

Hi,

After some investigation, it could be linked with how dependencies work. It's not uniform across all Ansible version when using when. These Ansible issues could be relevant:

this is normal and expected behavior for meta dependencies, de duplication is done on the 'call signature' of the role itself. If you want finer grained control I would recommend using include_role instead.

I've started replacing all dependencies by include_roles and import_roles to avoid this. I can do a PR if you think this is a right approch @yankay.

Thanks @RomainMou

I do not know how to reproduce it, so have no idea about does it a right approch temporarily. :-) Does the issue would be reproduced in the ansible >= [core 2.15.x] ?

yankay avatar Aug 16 '23 08:08 yankay

Yes @yankay, I've reproduced it on a new cluster installation with:

ansible==8.2.0
ansible-core==2.15.3

RomainMou avatar Aug 16 '23 08:08 RomainMou

Thank you @RomainMou

I upgrade the ansible to

ansible==8.3.0

the issue is reproduced.

fatal: [kay171]: FAILED! => {"attempts": 4, "changed": false, "cmd": "/usr/local/bin/nerdctl -n k8s.io pull --quiet quay.m.daocloud.io/calico/node:v3.25.1", "msg": "[Errno 2] No such file or directory: b'/usr/local/bin/nerdctl'", "rc": 2, "stderr": "", "stderr_lines": [], "stdout": "", "stdout_lines": []}
fatal: [kay172]: FAILED! => {"attempts": 4, "changed": false, "cmd": "/usr/local/bin/nerdctl -n k8s.io pull --quiet quay.m.daocloud.io/calico/node:v3.25.1", "msg": "[Errno 2] No such file or directory: b'/usr/local/bin/nerdctl'", "rc": 2, "stderr": "", "stderr_lines": [], "stdout": "", "stdout_lines": []}

I think supporting a new ansible version is very good for kubespray. It's very welcome to provide a PR to fix it.

@MrFreezeex @floryut, Would you please give some suggestions :-) Thanks.

yankay avatar Aug 16 '23 08:08 yankay

ansible -i inventory/mycluster/inventory.ini -u ubuntu --private-key=~/.ssh/id_rsa --become --become-user=root -b -m copy -a "src=/tmp/releases/nerdctl dest=/usr/local/bin/nerdctl mode=0755 remote_src=yes" all


ansible -i inventory/mycluster/inventory.ini -u ubuntu --private-key=~/.ssh/id_rsa --become --become-user=root -b -m copy -a "src=/tmp/releases/crictl dest=/usr/local/bin/crictl mode=0755 remote_src=yes" all

These lines fixed it on all nodes.

bugaian avatar Aug 18 '23 15:08 bugaian

Hi, After some investigation, it could be linked with how dependencies work. It's not uniform across all Ansible version when using when. These Ansible issues could be relevant:

this is normal and expected behavior for meta dependencies, de duplication is done on the 'call signature' of the role itself. If you want finer grained control I would recommend using include_role instead.

I've started replacing all dependencies by include_roles and import_roles to avoid this. I can do a PR if you think this is a right approch @yankay.

Thanks @RomainMou

I do not know how to reproduce it, so have no idea about does it a right approch temporarily. :-) Does the issue would be reproduced in the ansible >= [core 2.15.x] ?

It is reproducible with Ansible 2.15.4. Today I hit this error.

The full traceback is:
  File "/tmp/ansible_ansible.legacy.command_payload_nox4f_k6/ansible_ansible.legacy.command_payload.zip/ansible/module_utils/basic.py", line 2038, in run_command
    cmd = subprocess.Popen(args, **kwargs)
  File "/usr/lib64/python3.6/subprocess.py", line 729, in __init__
    restore_signals, start_new_session)
  File "/usr/lib64/python3.6/subprocess.py", line 1364, in _execute_child
    raise child_exception_type(errno_num, err_msg, err_filename)
fatal: [kvt0labvrpa0049]: FAILED! => {
    "attempts": 4,
    "changed": false,
    "cmd": "/usr/local/bin/nerdctl -n k8s.io pull --quiet quay.io/calico/node:v3.26.3",
    "invocation": {
        "module_args": {
            "_raw_params": "/usr/local/bin/nerdctl -n k8s.io pull --quiet quay.io/calico/node:v3.26.3",
            "_uses_shell": false,
            "argv": null,
            "chdir": null,
            "creates": null,
            "executable": null,
            "removes": null,
            "stdin": null,
            "stdin_add_newline": true,
            "strip_empty_ends": true
        }
    },
    "msg": "[Errno 2] No such file or directory: b'/usr/local/bin/nerdctl': b'/usr/local/bin/nerdctl'",
    "rc": 2,
    "stderr": "",
    "stderr_lines": [],
    "stdout": "",
    "stdout_lines": []
}

vyom-soft avatar Nov 08 '23 11:11 vyom-soft

It is reproducible with Ansible 2.15.4. Today I hit this error.

Hi! not sure how you launched kubespray with ansible 2.15.4 but with definitely does not support this version! Please use requirements.txt to install your ansible version

MrFreezeex avatar Nov 08 '23 11:11 MrFreezeex

Still reproducible with latest master ?

VannTen avatar Jan 22 '24 10:01 VannTen

/triage not-reproducible I could not reproduce this on master (please provide a reproducer if that's incorret)

VannTen avatar Feb 08 '24 08:02 VannTen

May be connected to the issue: after clean installation on Oracle Linux 9, /usr/local/bin was simply not present in $PATH, so I couldn't use binaries (nerdctl included) from my user without specifying the full path to it. This was not affecting the installation process, though - everything worked as expected.

user81230 avatar May 31 '24 13:05 user81230