kubespray
kubespray copied to clipboard
speed up by removing unneeded dependency of kubespray-defaults on downloads
What type of PR is this? /kind cleanup
What this PR does / why we need it:
At one point in the past it may have been that this dependency of defaults on downloads was needed to set a fact, but as far as I can tell it should not be anymore. The dependency is invoked with skip_downloads: true
, so none of the tasks in https://github.com/kubernetes-sigs/kubespray/blob/master/roles/download/tasks/main.yml will be executed - but they do all still get dynamically imported and processed so Ansible wastes a lot of time churning through each one of them.
Which issue(s) this PR fixes: Fixes https://github.com/kubernetes-sigs/kubespray/issues/9279
Special notes for your reviewer: See timing measurements posted below.
Does this PR introduce a user-facing change?: No
Hi @rptaylor. Thanks for your PR.
I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test
on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.
Once the patch is verified, the new status will be reflected by the ok-to-test
label.
I understand the commands that are listed here.
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
[APPROVALNOTIFIER] This PR is NOT APPROVED
This pull-request has been approved by: rptaylor
Once this PR has been reviewed and has the lgtm label, please assign miouge1 for approval by writing /assign @miouge1
in a comment. For more information see:The Kubernetes Code Review Process.
The full list of commands accepted by this bot can be found here.
Approvers can indicate their approval by writing /approve
in a comment
Approvers can cancel approval by writing /approve cancel
in a comment
I used this test.yml play to evaluate the impact of the change. This is a sort of benchmark that represents 2 out of the 12 times that the cluster.yml play invokes the kubespray-defaults role.
- hosts: kube_control_plane
gather_facts: False
any_errors_fatal: "{{ any_errors_fatal | default(true) }}"
environment: "{{ proxy_disable_env }}"
roles:
- { role: kubespray-defaults }
- hosts: k8s_cluster
gather_facts: False
any_errors_fatal: "{{ any_errors_fatal | default(true) }}"
environment: "{{ proxy_disable_env }}"
roles:
- { role: kubespray-defaults }
On a small dev cluster with 6 nodes: at first test.yml took 17 s to run. After removing the dependency it took 4s to run.
On a large cluster with 150 nodes, at first test.yml took 1 minute to run. The output shows all the download and container-engine tasks that were being imported, not doing anything except wasting time:
Thursday 15 September 2022 20:37:06 +0000 (0:00:03.117) 0:00:57.951 ****
===============================================================================
kubespray-defaults : Gather ansible_default_ipv4 from all hosts ------------------------------------------------------------------------------------------------------ 3.28s
kubespray-defaults : set fallback_ips -------------------------------------------------------------------------------------------------------------------------------- 3.26s
kubespray-defaults : Populates no_proxy to all hosts ----------------------------------------------------------------------------------------------------------------- 3.12s
kubespray-defaults : Configure defaults ------------------------------------------------------------------------------------------------------------------------------ 3.08s
prep_download | Set image pull/info command for docker on localhost -------------------------------------------------------------------------------------------------- 2.97s
prep_download | Parse the outputs of the previous commands ----------------------------------------------------------------------------------------------------------- 2.95s
prep_download | Set image pull/info command for containerd and crio -------------------------------------------------------------------------------------------------- 2.93s
prep_download | Set image pull/info command for docker --------------------------------------------------------------------------------------------------------------- 2.93s
container-engine/nerdctl : nerdctl | Download nerdctl ---------------------------------------------------------------------------------------------------------------- 2.93s
prep_download | Register docker images info -------------------------------------------------------------------------------------------------------------------------- 2.90s
prep_download | Create staging directory on remote node -------------------------------------------------------------------------------------------------------------- 2.88s
download | Get kubeadm binary and list of required images ------------------------------------------------------------------------------------------------------------ 2.85s
prep_download | Check that local user is in group or can become root ------------------------------------------------------------------------------------------------- 2.79s
container-engine/crictl : install crictĺ ----------------------------------------------------------------------------------------------------------------------------- 2.77s
download | Download files / images ----------------------------------------------------------------------------------------------------------------------------------- 2.76s
container-engine/nerdctl : nerdctl | Copy nerdctl binary from download dir ------------------------------------------------------------------------------------------- 2.76s
prep_download | Set a few facts -------------------------------------------------------------------------------------------------------------------------------------- 2.72s
prep_download | Set image pull/info command for containerd and crio on localhost ------------------------------------------------------------------------------------- 2.65s
kubespray-defaults : Gather ansible_default_ipv4 from all hosts ------------------------------------------------------------------------------------------------------ 1.71s
kubespray-defaults : create fallback_ips_base ------------------------------------------------------------------------------------------------------------------------ 0.85s
After removing the dependency, it took only 15 seconds, and the output shows that only the kubespray-defaults tasks are running:
Thursday 15 September 2022 20:38:54 +0000 (0:00:02.801) 0:00:15.947 ****
===============================================================================
kubespray-defaults : set fallback_ips -------------------------------------------------------------------------------------------------------------------------------- 3.65s
kubespray-defaults : Gather ansible_default_ipv4 from all hosts ------------------------------------------------------------------------------------------------------ 3.22s
kubespray-defaults : Configure defaults ------------------------------------------------------------------------------------------------------------------------------ 2.80s
kubespray-defaults : Populates no_proxy to all hosts ----------------------------------------------------------------------------------------------------------------- 2.80s
kubespray-defaults : Gather ansible_default_ipv4 from all hosts ------------------------------------------------------------------------------------------------------ 1.80s
kubespray-defaults : create fallback_ips_base ------------------------------------------------------------------------------------------------------------------------ 0.78s
kubespray-defaults : Configure defaults ------------------------------------------------------------------------------------------------------------------------------ 0.19s
kubespray-defaults : create fallback_ips_base ------------------------------------------------------------------------------------------------------------------------ 0.18s
kubespray-defaults : set fallback_ips -------------------------------------------------------------------------------------------------------------------------------- 0.18s
kubespray-defaults : Populates no_proxy to all hosts ----------------------------------------------------------------------------------------------------------------- 0.12s
kubespray-defaults : Set no_proxy to all assigned cluster IPs and hostnames ------------------------------------------------------------------------------------------ 0.10s
kubespray-defaults : Set no_proxy to all assigned cluster IPs and hostnames ------------------------------------------------------------------------------------------ 0.08s
So this should make defaults faster by a factor of 4 regardless of cluster size.
@cristicalin or @oomichi could you please take a look?
fatal: [ubuntu20]: FAILED! => {"msg": "The conditional check 'download_localhost' failed. The error was: error while evaluating conditional (download_localhost): 'download_localhost' is undefined\n\nThe error appears to be in '/builds/kargo-ci/kubernetes-sigs-kubespray/roles/kubernetes/preinstall/tasks/0020-verify-settings.yml': line 244, column 3, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appears to be:\n\n\n- name: Stop if download_localhost is enabled but download_run_once is not\n ^ here\n"}
I guess a lot of task will fail, the issue is that download role ship a lot of version for component as variable..
Interesting ... I don't understand how it could be doing anything with the equivalent of when: false
on every task but I'll take a closer look.
Oh of course, the role tasks don't execute but the role defaults still get loaded .... need a way to do one without the other... I didn't realize the role was only being included for its default variables but there should be a better way to do that ...
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied - After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied - After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closed
You can:
- Mark this issue or PR as fresh with
/remove-lifecycle stale
- Mark this issue or PR as rotten with
/lifecycle rotten
- Close this issue or PR with
/close
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied - After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied - After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closed
You can:
- Mark this issue or PR as fresh with
/remove-lifecycle rotten
- Close this issue or PR with
/close
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle rotten
/remove-lifecycle rotten
@rptaylor You may rebase master as if the CI is KO this PR won't be merged
The download
role is actually critical and will require refactoring to properly remove this dependency. In essence, download/defaults/main.yml
contains all the versions definitions used in kubespray (>90% anyway) and it is imported throughout the code when we need to do comparisons or download stuff. We also call individual task files from this role in other roles to download files which means this role becomes a dependency of those roles ... in short this part of the code is extremely ugly and requires a lot of cleanup to speed it up properly.
Your goal is noble @rptaylor but I'm afraid the road to get there is much more complicated than just deleting a few lines of code.
Yes, that is what I gathered in my previous comment. It is unusual to have a role that only sets variables. And the playbook is a large number of plays. In principle it should be possible for all the variable definitions to be moved to appropriate group vars with hopefully not too many if/else statements inside the variable definitions but it would be a big project.