kubespray download role executed too early when CRI-O instructions are followed

trafficstars

Environment:

Cloud provider or hardware configuration: amd64
OS (printf "$(uname -srm)\n$(cat /etc/os-release)\n"): Linux 5.15.0-25-generic x86_64 PRETTY_NAME="Ubuntu 22.04 LTS"
Version of Ansible (ansible --version): ansible==8.6.1 ansible-core==2.15.6
Version of Python (python --version): 3.10.12

Kubespray version (commit) (git rev-parse --short HEAD): 3acacc615

Network plugin used: calico

Full inventory with variables (ansible -i inventory/sample/inventory.ini all -m debug -a "var=hostvars[inventory_hostname]"):

skip_downloads: false

Output of ansible run:

2023-11-23 00:44:05,460 p=2875882 u=root n=ansible | TASK [kubernetes_sigs.kubespray.download : Prep_kubeadm_images | Create kubeadm config] ***
2023-11-23 00:44:05,461 p=2875882 u=root n=ansible | fatal: [aj09-17-dell-spr]: FAILED! => {
    "changed": false,
    "checksum": "601e28489b672da953a83dc549261b385c01a692"
}

MSG:

Destination directory /etc/kubernetes does not exist

Anything else do we need to know: Download role got called as dependency from kubespray-defaults role but skip_downloads: true var defined in meta/main.yml was not applied. That results in downloading items early on when /etc/kubernetes directory does not exist yet,

Nov 27 '23 17:11 dkasanic

HI @dkasanic

Thanks for the issue and PR. Would you please give more information about the kubespray or ansible config to reproduce the error ?

It's very helpful :-)

Thanks you :-)

Nov 28 '23 02:11 yankay

Hello, @yankay

In my env, I install kubespray as galaxy collection and then import cluster.yml. To reproduce the error, I believe following snippet of tasks in my playbook is enough:

- name: add crio runtime vars
   set_fact:
     container_manager: crio
     download_container: false
     skip_downloads: false
     etcd_deployment_type: host
- name: Deploy cluster via Kubespray
  any_errors_fatal: true
  ansible.builtin.import_playbook: kubernetes_sigs.kubespray.cluster

the error was not encountered before ansible-core upgrade to 2.15.6 from 2.14.9

It seems in such case, skip_downloads: true var definition in meta/main.yml file will not kick in properly and the download role will start download items, but that should not happen at this stage of run. It should happen after kubespray-defaults role is executed and download role is called from cluster.yaml playbook.

As soon as I removed skip_downloads: false var definition from set_fact task, deployment started working correctly. The problem is in following meta/main.yml file:

dependencies:
  - role: download
    skip_downloads: true
    tags:
      - facts

as per ansible docs, it should be defined as:

dependencies:
  - role: download
    vars:
      skip_downloads: true
    tags:
      - facts

Nov 28 '23 08:11 dkasanic

Does #10626 fix your problem ? (since download is no longer pulled in by kubespray-defaults)

Nov 29 '23 17:11 VannTen

Is the problem still present on master ? I believe the PR linked in the previous message might have fixed the issue

Download role got called as dependency from kubespray-defaults role but skip_downloads: true var defined in meta/main.yml was not applied. That results in downloading items early on when /etc/kubernetes directory does not exist yet,

(Since this is no longer true)

Jan 16 '24 13:01 VannTen

/triage needs-information

Jan 30 '24 09:01 VannTen

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

Apr 29 '24 10:04 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle rotten
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

May 29 '24 10:05 k8s-triage-robot

/remove-lifecycle rotten

Jun 08 '24 19:06 vaibhav2107

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

Sep 06 '24 20:09 k8s-triage-robot

kubespray kubespray copied to clipboard

download role executed too early when CRI-O instructions are followed

kubespray
kubespray copied to clipboard