kubespray icon indicating copy to clipboard operation
kubespray copied to clipboard

Custom Binary directoy bin_dir bug

Open sohnaeo opened this issue 2 years ago • 5 comments

Environment:

  • Cloud provider or hardware configuration:

AWS

  • OS (printf "$(uname -srm)\n$(cat /etc/os-release)\n"):

PRETTY_NAME="Ubuntu 22.04.1 LTS" NAME="Ubuntu" VERSION_ID="22.04" VERSION="22.04.1 LTS (Jammy Jellyfish)" VERSION_CODENAME=jammy ID=ubuntu ID_LIKE=debian HOME_URL="https://www.ubuntu.com/" SUPPORT_URL="https://help.ubuntu.com/" BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/" PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy" UBUNTU_CODENAME=jammy

  • Version of Ansible (ansible --version): ansible [core 2.12.8] python version = 3.10.4 (main, Jun 29 2022, 12:14:53) [GCC 11.2.0] jinja version = 3.1.2 libyaml = False

  • Version of Python (python --version): Python 3.10.4

Kubespray version (commit) (git rev-parse --short HEAD):

Master Branch 6db6c867

Network plugin used:

Calico

Full inventory with variables (ansible -i inventory/sample/inventory.ini all -m debug -a "var=hostvars[inventory_hostname]"):

Command used to invoke ansible:

ansible-playbook -i ./inventory/
--become
--private-key=$ANSIBLE_PRIVATE_KEY
-e ansible_ssh_user=$ANSIBLE_SSH_USER
cluster.yml

Output of ansible run:

Anything else do we need to know: We have to install all the binaries into the different location than /usr/local/bin so we changed the variable

bin_dir: /opendata/bin

Kubespray installation failed when it started Initializing the master as it can't find runc

Sep 1 02:59:09 ip-10-2-70-201 kubelet[972]: E0901 02:59:09.994499 972 pod_workers.go:951] "Error syncing pod, skipping" err="failed to \"CreatePodSandbox\" for \"kube-scheduler-ip-10-2-70-201.ap-southeast-2.compute.internal_kube-system(544d88b8eff97d016f13939da5fd6ceb)\" with CreatePodSandboxError: \"Failed to create sandbox for pod \\\"kube-scheduler-ip-10-2-70-201.ap-southeast-2.compute.internal_kube-system(544d88b8eff97d016f13939da5fd6ceb)\\\": rpc error: code = Unknown desc = failed to create containerd task: failed to create shim task: OCI runtime create failed: unable to retrieve OCI runtime error (open /opendev/run/containerd/io.containerd.runtime.v2.task/k8s.io/8d8a06167b7b047cd800a53150c3c7ecfa5b31f2ec6cb3a25b2a0fe2df582fb7/log.json: no such file or directory): exec: \\\"runc\\\": executable file not found in $PATH: unknown\"" pod="kube-system/kube-scheduler-ip-10-2-70-201.ap-southeast-2.compute.internal" podUID=544d88b8eff97d016f13939da5fd6ceb

While it is waiting initializing , I manually copied all the binaries from /opendata/bin to /usr/local/bin and then it worked. After the installation, I re-produced the issue by removing the binaries from /usr/local/bin/* and problem came back. I added "/opendata/bin" into /etc/environment to make sure all users have it in their path and I tested successfully but somehow it didn't help

I checked /etc/kuberenetes/kubelet.env and it got the correct path

PATH=/opendata/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin

But it seems bug to me seems /usr/local/bin hardcoded somewhere which I can't find in the repo.

I am more than happy to help to reproduce the issue.

sohnaeo avatar Sep 01 '22 03:09 sohnaeo

I investigated bit further. As per my understanding, containerd-shim is the one calling runc when needed therefore it needs to always find it in its PATH. containerd-shims inherit their PATH from containerd. If we opt to install runc in bin_dir than /bin or /usr/local/bin, what should we do so containerd-shim call runc from different location.

If we choose "bin_dir" than /usr/local/bin then kubespray installation breaks so need to find a way to fix this.

sohnaeo avatar Sep 05 '22 13:09 sohnaeo

I believe I found a solution for this, I used the below variable to fix it

containerd_extra_args: | [plugins."io.containerd.internal.v1.opt"] path = "/opendata"

By default, containerd looks /usr/bin or /opt/containerd to find the runc command so if binaries are installed in different locations, containerd need to know about it, you dont need to provide the "bin" into path. Example if all binaries are installed under /opendata/bin so below line into /etc/containerd/confg.toml at the end would fix the issue

[plugins."io.containerd.internal.v1.opt"] path = "/opendata"

I believe , kubespray code or documentation should address this issue. If user is installing binaries installed into different location, installation will break for sure so users should be aware of this step.

sohnaeo avatar Sep 07 '22 01:09 sohnaeo

Below is the link that explains it

https://github.com/containerd/containerd/blob/main/docs/managed-opt.md

For those users, who are using containerd first time and they set bin_dir different directory, installation will break for sure. I believe when user set bin_dir variable other than /usr/local/bin, user need to be warned or add the below snippt in config.toml for the containerd.

[plugins."io.containerd.internal.v1.opt"] path = "/opt/mypath"

sohnaeo avatar Sep 08 '22 23:09 sohnaeo

Great investigation @sohnaeo, I guess we should fix that

floryut avatar Sep 20 '22 08:09 floryut

@floryut

Definitely, we need to fix it. Let me know if I can help in any capacity.

sohnaeo avatar Sep 20 '22 12:09 sohnaeo

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar Dec 19 '22 12:12 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot avatar Jan 18 '23 13:01 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue with /reopen
  • Mark this issue as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

k8s-triage-robot avatar Feb 17 '23 14:02 k8s-triage-robot

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue with /reopen
  • Mark this issue as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot avatar Feb 17 '23 14:02 k8s-ci-robot