image-builder
image-builder copied to clipboard
Fix SSH/scp/SFTP issues for OpenSSH 9.0+ and Flatcar stable
This PR tries to fix various issues with Flatcar builds. This fixes multiple issues but at this point it isn't possible to test them in isolation.
Fixes https://github.com/kubernetes-sigs/image-builder/issues/905
Workaround for https://github.com/kubernetes-sigs/image-builder/issues/859
Additional context Add any other context for the reviewers
Welcome @kopiczko!
It looks like this is your first PR to kubernetes-sigs/image-builder 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.
You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.
You can also check if kubernetes-sigs/image-builder has its own contribution guidelines.
You may want to refer to our testing guide if you run into trouble with your tests not passing.
If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!
Thank you, and welcome to Kubernetes. :smiley:
Hi @kopiczko. Thanks for your PR.
I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test
on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.
Once the patch is verified, the new status will be reflected by the ok-to-test
label.
I understand the commands that are listed here.
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
CC @invidian
Unknown CLA label state. Rechecking for CLA labels.
Send feedback to sig-contributor-experience at kubernetes/community.
/check-cla /easycla
I'm testing building and I get this error right now:
sig-flatcar: TASK [python : Get distribution name from lsb-release] ************************* sig-flatcar: fatal: [default]: UNREACHABLE! => {"changed": false, "msg": "Failed to connect to the host via ssh: Unable to negotiate with 127.0.0.1 port 34989: no matching host key type found. Their offer: ssh-rsa", "unreachable": true}
This was with Flatcar version
3139.2.1
and3033.2.4
.
Both those versions build fine for me :(
Both those versions build fine for me :(
I'll do some more testing to try to figure out what's going on :+1:
CC @johananl
/assign
so what is the exact state of this PR? It seems to be fixing two problems (the etcd srcipt, and some SSH improvements?) Those should perhaps be two different PRs. This also have 5 commits, some of which are merge commits (and will block the PR).
Opened https://github.com/kubernetes-sigs/image-builder/pull/923 with just Flatcar build fix.
I'd like to have that merged https://github.com/kubernetes-sigs/image-builder/pull/912 and then I could squash it. Or I simply remove the file from this PR
@codenrhoden I updated the title. After extracting https://github.com/kubernetes-sigs/image-builder/pull/912 and https://github.com/kubernetes-sigs/image-builder/pull/923 the scope is narrowed down to fixing OpenSSH 9.0+ issues and sftp binary in Flatcar. I reflected that in the title. All commits are squashed.
I'd expect here at least 3 commits though. One for refactoring the parameters to a common place, one with ssh-rsa
fix referring to appropriate issue, another one for SFTP fix. As merge commits are used in this repository, I think it would be valuable to keep things separate and small, even for the case if some patch needs to be reverted later or for bisecting.
The commit message is not aligned with the content anymore.
@codenrhoden @invidian from my POV this is ready to be merged. I tested it, split into cherry-pickable commits and rebased.
@aniruddha2000 I added quotation around -O
. It was also failing for me. I didn't notice in the first place because of my dev branch quirk.
/unassign @kopiczko /assign @fabriziopandini
I don't have enough context about imageBuilder to validate a change impacting so many providers 😓 It would be great to have folks with specific expertise involved instead /unassign
looks like the failures are due to some of the changes with ssh, both 1804 and 2004 are failing with:
[0;32m vhd-ubuntu-2004: TASK [include_role : python] ***************************************************[0m
[0;32m vhd-ubuntu-2004: [WARNING]: raw module does not support the environment keyword[0m
[0;32m vhd-ubuntu-2004:[0m
[0;32m vhd-ubuntu-2004: TASK [python : Get distribution name from lsb-release] *************************[0m
[0;32m vhd-ubuntu-2004: fatal: [default]: UNREACHABLE! => {"changed": false, "msg": "Failed to connect to the host via ssh: command-line: line 0: Bad configuration option: pubkeyacceptedalgorithms", "unreachable": true}[0m
[0;32m vhd-ubuntu-2004:[0m
[0;32m vhd-ubuntu-2004: PLAY RECAP *********************************************************************[0m
[0;32m vhd-ubuntu-2004: default : ok=0 changed=0 unreachable=1 failed=0 skipped=0
@kopiczko do you have any updates on this?
[APPROVALNOTIFIER] This PR is NOT APPROVED
This pull-request has been approved by: invidian, kopiczko
Once this PR has been reviewed and has the lgtm label, please assign fabriziopandini for approval by writing /assign @fabriziopandini
in a comment. For more information see:The Kubernetes Code Review Process.
The full list of commands accepted by this bot can be found here.
Approvers can indicate their approval by writing /approve
in a comment
Approvers can cancel approval by writing /approve cancel
in a comment
CI is failing with:
[0;32m vsphere-clone: fatal: [default]: FAILED! => {"msg": "failed to transfer file to /root/.ansible/tmp/ansible-local-1386oe5wt7mv/tmpx_roh_r0 /tmp/.ansible/ansible-tmp-1661355877.2552273-1392-127598361667597/AnsiballZ_setup.py:\n\nunknown option -- O\r\nusage: scp [-346BCpqrv] [-c cipher] [-F ssh_config] [-i identity_file]\n [-l limit] [-o ssh_option] [-P port] [-S program] source ... target\n"}[0m
So this is the host issue. It looks like the CI VMs there are running some older OpenSSH version. I'm not sure how to proceed from here. Should I simply close it? Should there at least be some docs on how to build with newer OpenSSH?
So this is the host issue. It looks like the CI VMs there are running some older OpenSSH version. I'm not sure how to proceed from here. Should I simply close it? Should there at least be some docs on how to build with newer OpenSSH?
As I mentioned in https://github.com/kubernetes-sigs/image-builder/pull/907#discussion_r901413880, CI runs in Docker container running Debian. Or at least have been at this point in time.
@kopiczko: The following tests failed, say /retest
to rerun all failed tests or /retest-required
to rerun all mandatory failed tests:
Test name | Commit | Details | Required | Rerun command |
---|---|---|---|---|
json-sort-check | b209ad7e8ea5da8a5afb81408bf972b42eae9317 | link | true | /test json-sort-check |
pull-ova-all | b209ad7e8ea5da8a5afb81408bf972b42eae9317 | link | true | /test pull-ova-all |
pull-azure-vhds | b209ad7e8ea5da8a5afb81408bf972b42eae9317 | link | true | /test pull-azure-vhds |
pull-azure-sigs | b209ad7e8ea5da8a5afb81408bf972b42eae9317 | link | false | /test pull-azure-sigs |
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.
Are there any updates on this? We are waiting on this to merge so that we can finalize our PR 😅
Are there any updates on this? We are waiting on this to merge so that we can finalize our PR 😅
@aniruddha2000 unfortunately I don't think it's going anywhere without CI OS being upgraded. Can you describe why it is blocking your PR?
CI is failing with:
�[0;32m vsphere-clone: fatal: [default]: FAILED! => {"msg": "failed to transfer file to /root/.ansible/tmp/ansible-local-1386oe5wt7mv/tmpx_roh_r0 /tmp/.ansible/ansible-tmp-1661355877.2552273-1392-127598361667597/AnsiballZ_setup.py:\n\nunknown option -- O\r\nusage: scp [-346BCpqrv] [-c cipher] [-F ssh_config] [-i identity_file]\n [-l limit] [-o ssh_option] [-P port] [-S program] source ... target\n"}�[0m
So this is the host issue. It looks like the CI VMs there are running some older OpenSSH version. I'm not sure how to proceed from here. Should I simply close it? Should there at least be some docs on how to build with newer OpenSSH?
@kopiczko the jobs use the latest container images to run the tests. If we do not have the latest openSSH package on it, maybe we can PR that somewhere?
@kopiczko Our ansible_env_vars
were failing but with your changes in this PR, we successfully created our OS image.
So if your ansible changes are merged in this PR we will merge our changes too.
These changes fixed the following qemu image build errors for me. :+1:
fatal: [default]: UNREACHABLE! => {"changed": false, "msg": "Failed to connect to the host via ssh: Unable to negotiate with 127.0.0.1 port 46489: no matching host key type found. Their offer: ssh-rsa", "unreachable": true}
and
fatal: [default]: FAILED! => {"msg": "failed to transfer file to /home/joe/.ansible/tmp/ansible-local-1545693264eeguh/tmpmq7bxkpk /tmp/.ansible/ansible-tmp-1666398201.818839-1545697-148037159050528/AnsiballZ_setup.py:\n\n"}
CI is failing with:
�[0;32m vsphere-clone: fatal: [default]: FAILED! => {"msg": "failed to transfer file to /root/.ansible/tmp/ansible-local-1386oe5wt7mv/tmpx_roh_r0 /tmp/.ansible/ansible-tmp-1661355877.2552273-1392-127598361667597/AnsiballZ_setup.py:\n\nunknown option -- O\r\nusage: scp [-346BCpqrv] [-c cipher] [-F ssh_config] [-i identity_file]\n [-l limit] [-o ssh_option] [-P port] [-S program] source ... target\n"}�[0m
So this is the host issue. It looks like the CI VMs there are running some older OpenSSH version. I'm not sure how to proceed from here. Should I simply close it? Should there at least be some docs on how to build with newer OpenSSH?
@kopiczko the jobs use the latest container images to run the tests. If we do not have the latest openSSH package on it, maybe we can PR that somewhere?
@kkeshavamurthy do you know where that somewhere is?
@kopiczko: PR needs rebase.
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.