kubespray
kubespray copied to clipboard
Custom Binary directoy bin_dir bug
Environment:
- Cloud provider or hardware configuration:
AWS
-
OS (
printf "$(uname -srm)\n$(cat /etc/os-release)\n"
):
PRETTY_NAME="Ubuntu 22.04.1 LTS" NAME="Ubuntu" VERSION_ID="22.04" VERSION="22.04.1 LTS (Jammy Jellyfish)" VERSION_CODENAME=jammy ID=ubuntu ID_LIKE=debian HOME_URL="https://www.ubuntu.com/" SUPPORT_URL="https://help.ubuntu.com/" BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/" PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy" UBUNTU_CODENAME=jammy
-
Version of Ansible (
ansible --version
): ansible [core 2.12.8] python version = 3.10.4 (main, Jun 29 2022, 12:14:53) [GCC 11.2.0] jinja version = 3.1.2 libyaml = False -
Version of Python (
python --version
): Python 3.10.4
Kubespray version (commit) (git rev-parse --short HEAD
):
Master Branch 6db6c867
Network plugin used:
Calico
Full inventory with variables (ansible -i inventory/sample/inventory.ini all -m debug -a "var=hostvars[inventory_hostname]"
):
Command used to invoke ansible:
ansible-playbook -i ./inventory/
--become
--private-key=$ANSIBLE_PRIVATE_KEY
-e ansible_ssh_user=$ANSIBLE_SSH_USER
cluster.yml
Output of ansible run:
Anything else do we need to know: We have to install all the binaries into the different location than /usr/local/bin so we changed the variable
bin_dir: /opendata/bin
Kubespray installation failed when it started Initializing the master as it can't find runc
Sep 1 02:59:09 ip-10-2-70-201 kubelet[972]: E0901 02:59:09.994499 972 pod_workers.go:951] "Error syncing pod, skipping" err="failed to \"CreatePodSandbox\" for \"kube-scheduler-ip-10-2-70-201.ap-southeast-2.compute.internal_kube-system(544d88b8eff97d016f13939da5fd6ceb)\" with CreatePodSandboxError: \"Failed to create sandbox for pod \\\"kube-scheduler-ip-10-2-70-201.ap-southeast-2.compute.internal_kube-system(544d88b8eff97d016f13939da5fd6ceb)\\\": rpc error: code = Unknown desc = failed to create containerd task: failed to create shim task: OCI runtime create failed: unable to retrieve OCI runtime error (open /opendev/run/containerd/io.containerd.runtime.v2.task/k8s.io/8d8a06167b7b047cd800a53150c3c7ecfa5b31f2ec6cb3a25b2a0fe2df582fb7/log.json: no such file or directory): exec: \\\"runc\\\": executable file not found in $PATH: unknown\"" pod="kube-system/kube-scheduler-ip-10-2-70-201.ap-southeast-2.compute.internal" podUID=544d88b8eff97d016f13939da5fd6ceb
While it is waiting initializing , I manually copied all the binaries from /opendata/bin to /usr/local/bin and then it worked. After the installation, I re-produced the issue by removing the binaries from /usr/local/bin/* and problem came back. I added "/opendata/bin" into /etc/environment to make sure all users have it in their path and I tested successfully but somehow it didn't help
I checked /etc/kuberenetes/kubelet.env and it got the correct path
PATH=/opendata/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
But it seems bug to me seems /usr/local/bin hardcoded somewhere which I can't find in the repo.
I am more than happy to help to reproduce the issue.
I investigated bit further. As per my understanding, containerd-shim is the one calling runc when needed therefore it needs to always find it in its PATH. containerd-shims inherit their PATH from containerd. If we opt to install runc in bin_dir than /bin or /usr/local/bin, what should we do so containerd-shim call runc from different location.
If we choose "bin_dir" than /usr/local/bin then kubespray installation breaks so need to find a way to fix this.
I believe I found a solution for this, I used the below variable to fix it
containerd_extra_args: |
[plugins."io.containerd.internal.v1.opt"]
path = "/opendata"
By default, containerd looks /usr/bin or /opt/containerd to find the runc command so if binaries are installed in different locations, containerd need to know about it, you dont need to provide the "bin" into path. Example if all binaries are installed under /opendata/bin so below line into /etc/containerd/confg.toml at the end would fix the issue
[plugins."io.containerd.internal.v1.opt"]
path = "/opendata"
I believe , kubespray code or documentation should address this issue. If user is installing binaries installed into different location, installation will break for sure so users should be aware of this step.
Below is the link that explains it
https://github.com/containerd/containerd/blob/main/docs/managed-opt.md
For those users, who are using containerd first time and they set bin_dir different directory, installation will break for sure. I believe when user set bin_dir variable other than /usr/local/bin, user need to be warned or add the below snippt in config.toml for the containerd.
[plugins."io.containerd.internal.v1.opt"]
path = "/opt/mypath"
Great investigation @sohnaeo, I guess we should fix that
@floryut
Definitely, we need to fix it. Let me know if I can help in any capacity.
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied - After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied - After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closed
You can:
- Mark this issue or PR as fresh with
/remove-lifecycle stale
- Mark this issue or PR as rotten with
/lifecycle rotten
- Close this issue or PR with
/close
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied - After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied - After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closed
You can:
- Mark this issue or PR as fresh with
/remove-lifecycle rotten
- Close this issue or PR with
/close
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle rotten
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied - After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied - After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closed
You can:
- Reopen this issue with
/reopen
- Mark this issue as fresh with
/remove-lifecycle rotten
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/close not-planned
@k8s-triage-robot: Closing this issue, marking it as "Not Planned".
In response to this:
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied- After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied- After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closedYou can:
- Reopen this issue with
/reopen
- Mark this issue as fresh with
/remove-lifecycle rotten
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/close not-planned
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.