image-builder icon indicating copy to clipboard operation
image-builder copied to clipboard

Fix SSH/scp/SFTP issues for OpenSSH 9.0+ and Flatcar stable

Open kopiczko opened this issue 2 years ago • 27 comments

This PR tries to fix various issues with Flatcar builds. This fixes multiple issues but at this point it isn't possible to test them in isolation.

Fixes https://github.com/kubernetes-sigs/image-builder/issues/905

Workaround for https://github.com/kubernetes-sigs/image-builder/issues/859

Additional context Add any other context for the reviewers

kopiczko avatar Jun 01 '22 15:06 kopiczko

Welcome @kopiczko!

It looks like this is your first PR to kubernetes-sigs/image-builder 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes-sigs/image-builder has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. :smiley:

k8s-ci-robot avatar Jun 01 '22 15:06 k8s-ci-robot

Hi @kopiczko. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot avatar Jun 01 '22 15:06 k8s-ci-robot

CC @invidian

kopiczko avatar Jun 01 '22 15:06 kopiczko

Unknown CLA label state. Rechecking for CLA labels.

Send feedback to sig-contributor-experience at kubernetes/community.

/check-cla /easycla

k8s-triage-robot avatar Jun 01 '22 19:06 k8s-triage-robot

I'm testing building and I get this error right now: sig-flatcar: TASK [python : Get distribution name from lsb-release] ************************* sig-flatcar: fatal: [default]: UNREACHABLE! => {"changed": false, "msg": "Failed to connect to the host via ssh: Unable to negotiate with 127.0.0.1 port 34989: no matching host key type found. Their offer: ssh-rsa", "unreachable": true}

This was with Flatcar version 3139.2.1 and 3033.2.4.

Both those versions build fine for me :(

kopiczko avatar Jun 02 '22 10:06 kopiczko

Both those versions build fine for me :(

I'll do some more testing to try to figure out what's going on :+1:

invidian avatar Jun 02 '22 11:06 invidian

CC @johananl

kopiczko avatar Jun 16 '22 19:06 kopiczko

/assign

kopiczko avatar Jun 17 '22 18:06 kopiczko

so what is the exact state of this PR? It seems to be fixing two problems (the etcd srcipt, and some SSH improvements?) Those should perhaps be two different PRs. This also have 5 commits, some of which are merge commits (and will block the PR).

codenrhoden avatar Jun 23 '22 22:06 codenrhoden

Opened https://github.com/kubernetes-sigs/image-builder/pull/923 with just Flatcar build fix.

invidian avatar Jun 27 '22 09:06 invidian

I'd like to have that merged https://github.com/kubernetes-sigs/image-builder/pull/912 and then I could squash it. Or I simply remove the file from this PR

kopiczko avatar Jun 27 '22 17:06 kopiczko

@codenrhoden I updated the title. After extracting https://github.com/kubernetes-sigs/image-builder/pull/912 and https://github.com/kubernetes-sigs/image-builder/pull/923 the scope is narrowed down to fixing OpenSSH 9.0+ issues and sftp binary in Flatcar. I reflected that in the title. All commits are squashed.

kopiczko avatar Jun 28 '22 09:06 kopiczko

I'd expect here at least 3 commits though. One for refactoring the parameters to a common place, one with ssh-rsa fix referring to appropriate issue, another one for SFTP fix. As merge commits are used in this repository, I think it would be valuable to keep things separate and small, even for the case if some patch needs to be reverted later or for bisecting.

invidian avatar Jun 28 '22 09:06 invidian

The commit message is not aligned with the content anymore.

invidian avatar Jul 05 '22 17:07 invidian

@codenrhoden @invidian from my POV this is ready to be merged. I tested it, split into cherry-pickable commits and rebased.

@aniruddha2000 I added quotation around -O. It was also failing for me. I didn't notice in the first place because of my dev branch quirk.

kopiczko avatar Jul 13 '22 19:07 kopiczko

/unassign @kopiczko /assign @fabriziopandini

kopiczko avatar Jul 14 '22 18:07 kopiczko

I don't have enough context about imageBuilder to validate a change impacting so many providers 😓 It would be great to have folks with specific expertise involved instead /unassign

fabriziopandini avatar Jul 19 '22 09:07 fabriziopandini

looks like the failures are due to some of the changes with ssh, both 1804 and 2004 are failing with:

[0;32m    vhd-ubuntu-2004: TASK [include_role : python] ***************************************************[0m
[0;32m    vhd-ubuntu-2004: [WARNING]: raw module does not support the environment keyword[0m
[0;32m    vhd-ubuntu-2004:[0m
[0;32m    vhd-ubuntu-2004: TASK [python : Get distribution name from lsb-release] *************************[0m
[0;32m    vhd-ubuntu-2004: fatal: [default]: UNREACHABLE! => {"changed": false, "msg": "Failed to connect to the host via ssh: command-line: line 0: Bad configuration option: pubkeyacceptedalgorithms", "unreachable": true}[0m
[0;32m    vhd-ubuntu-2004:[0m
[0;32m    vhd-ubuntu-2004: PLAY RECAP *********************************************************************[0m
[0;32m    vhd-ubuntu-2004: default                    : ok=0    changed=0    unreachable=1    failed=0    skipped=0    

jsturtevant avatar Jul 21 '22 16:07 jsturtevant

@kopiczko do you have any updates on this?

kkeshavamurthy avatar Aug 23 '22 17:08 kkeshavamurthy

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: invidian, kopiczko Once this PR has been reviewed and has the lgtm label, please assign fabriziopandini for approval by writing /assign @fabriziopandini in a comment. For more information see:The Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment Approvers can cancel approval by writing /approve cancel in a comment

k8s-ci-robot avatar Aug 24 '22 15:08 k8s-ci-robot

CI is failing with:

[0;32m    vsphere-clone: fatal: [default]: FAILED! => {"msg": "failed to transfer file to /root/.ansible/tmp/ansible-local-1386oe5wt7mv/tmpx_roh_r0 /tmp/.ansible/ansible-tmp-1661355877.2552273-1392-127598361667597/AnsiballZ_setup.py:\n\nunknown option -- O\r\nusage: scp [-346BCpqrv] [-c cipher] [-F ssh_config] [-i identity_file]\n           [-l limit] [-o ssh_option] [-P port] [-S program] source ... target\n"}[0m

So this is the host issue. It looks like the CI VMs there are running some older OpenSSH version. I'm not sure how to proceed from here. Should I simply close it? Should there at least be some docs on how to build with newer OpenSSH?

kopiczko avatar Aug 24 '22 15:08 kopiczko

So this is the host issue. It looks like the CI VMs there are running some older OpenSSH version. I'm not sure how to proceed from here. Should I simply close it? Should there at least be some docs on how to build with newer OpenSSH?

As I mentioned in https://github.com/kubernetes-sigs/image-builder/pull/907#discussion_r901413880, CI runs in Docker container running Debian. Or at least have been at this point in time.

invidian avatar Aug 24 '22 16:08 invidian

@kopiczko: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
json-sort-check b209ad7e8ea5da8a5afb81408bf972b42eae9317 link true /test json-sort-check
pull-ova-all b209ad7e8ea5da8a5afb81408bf972b42eae9317 link true /test pull-ova-all
pull-azure-vhds b209ad7e8ea5da8a5afb81408bf972b42eae9317 link true /test pull-azure-vhds
pull-azure-sigs b209ad7e8ea5da8a5afb81408bf972b42eae9317 link false /test pull-azure-sigs

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

k8s-ci-robot avatar Aug 24 '22 16:08 k8s-ci-robot

Are there any updates on this? We are waiting on this to merge so that we can finalize our PR 😅

aniruddha2000 avatar Sep 24 '22 14:09 aniruddha2000

Are there any updates on this? We are waiting on this to merge so that we can finalize our PR 😅

@aniruddha2000 unfortunately I don't think it's going anywhere without CI OS being upgraded. Can you describe why it is blocking your PR?

kopiczko avatar Sep 26 '22 18:09 kopiczko

CI is failing with:

�[0;32m    vsphere-clone: fatal: [default]: FAILED! => {"msg": "failed to transfer file to /root/.ansible/tmp/ansible-local-1386oe5wt7mv/tmpx_roh_r0 /tmp/.ansible/ansible-tmp-1661355877.2552273-1392-127598361667597/AnsiballZ_setup.py:\n\nunknown option -- O\r\nusage: scp [-346BCpqrv] [-c cipher] [-F ssh_config] [-i identity_file]\n           [-l limit] [-o ssh_option] [-P port] [-S program] source ... target\n"}�[0m

So this is the host issue. It looks like the CI VMs there are running some older OpenSSH version. I'm not sure how to proceed from here. Should I simply close it? Should there at least be some docs on how to build with newer OpenSSH?

@kopiczko the jobs use the latest container images to run the tests. If we do not have the latest openSSH package on it, maybe we can PR that somewhere?

kkeshavamurthy avatar Sep 26 '22 22:09 kkeshavamurthy

@kopiczko Our ansible_env_vars were failing but with your changes in this PR, we successfully created our OS image. So if your ansible changes are merged in this PR we will merge our changes too.

aniruddha2000 avatar Sep 27 '22 05:09 aniruddha2000

These changes fixed the following qemu image build errors for me. :+1: fatal: [default]: UNREACHABLE! => {"changed": false, "msg": "Failed to connect to the host via ssh: Unable to negotiate with 127.0.0.1 port 46489: no matching host key type found. Their offer: ssh-rsa", "unreachable": true} and fatal: [default]: FAILED! => {"msg": "failed to transfer file to /home/joe/.ansible/tmp/ansible-local-1545693264eeguh/tmpmq7bxkpk /tmp/.ansible/ansible-tmp-1666398201.818839-1545697-148037159050528/AnsiballZ_setup.py:\n\n"}

kralicky avatar Oct 22 '22 00:10 kralicky

CI is failing with:

�[0;32m    vsphere-clone: fatal: [default]: FAILED! => {"msg": "failed to transfer file to /root/.ansible/tmp/ansible-local-1386oe5wt7mv/tmpx_roh_r0 /tmp/.ansible/ansible-tmp-1661355877.2552273-1392-127598361667597/AnsiballZ_setup.py:\n\nunknown option -- O\r\nusage: scp [-346BCpqrv] [-c cipher] [-F ssh_config] [-i identity_file]\n           [-l limit] [-o ssh_option] [-P port] [-S program] source ... target\n"}�[0m

So this is the host issue. It looks like the CI VMs there are running some older OpenSSH version. I'm not sure how to proceed from here. Should I simply close it? Should there at least be some docs on how to build with newer OpenSSH?

@kopiczko the jobs use the latest container images to run the tests. If we do not have the latest openSSH package on it, maybe we can PR that somewhere?

@kkeshavamurthy do you know where that somewhere is?

kopiczko avatar Oct 27 '22 09:10 kopiczko

@kopiczko: PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot avatar Nov 03 '22 05:11 k8s-ci-robot