docker-ssh-agent icon indicating copy to clipboard operation
docker-ssh-agent copied to clipboard

wip: fix Windows tests

Open lemeurherve opened this issue 2 years ago • 10 comments

Fixes #292

Explanations:

  • Some environment variables were empty when using them in tests, adding $global: fixed it. (ex: PUBLIC_SSH_KEY, PRIVATE_SSH_KEY)
  • docker run was called with --publish but without specifying any port, calling it with --publish-all to publish all exposed ports to random ports fixed it.

Testing done

### Submitter checklist
- [ ] Make sure you are opening from a **topic/feature/bugfix branch** (right side) and not your main branch!
- [ ] Ensure that the pull request title represents the desired changelog entry
- [ ] Please describe what you did
- [ ] Link to relevant issues in GitHub or Jira
- [ ] Link to relevant pull requests, esp. upstream and downstream changes
- [ ] Ensure you have provided tests - that demonstrates feature works or fixes the issue

lemeurherve avatar Aug 09 '23 00:08 lemeurherve

Note: some tests are fixed but I'm struggling on the remaining ones involving SSH, failing on both nanoserver and windowsservercore images:

Running tests from 'sshAgent.Tests.ps1'
 Starting Run-Program with cmd = docker.exe, params = inspect --format "{{.State.Running}}" pester-jenkins-ssh-agent-nanoserver-1809-jdk11
 Describing [nanoserver-1809-jdk11] create agent container with pubkey as argument
 Starting Run-Program with cmd = docker.exe, params = port pester-jenkins-ssh-agent-nanoserver-1809-jdk11 22
 Starting Run-Program with cmd = C:\Jenkins\agent\workspace\ackaging_docker-ssh-agent_PR-295\tests\ssh.exe, params = -i "C:\Windows\TEMP\tmpE779.tmp" -o LogLevel=quiet -o UserKnownHostsFile=NUL -o StrictHostKeyChecking=no -l jenkins localhost -p 50103 pwsh.exe -NoLogo -C "Write-Host 'f00'"
 
 stdout:
  
 stderr:
  
   [-] runs commands via ssh 2.27s (2.04s|231ms)
    Expected 0, but got 255.
    at $exitCode | Should -Be 0, C:\Jenkins\agent\workspace\ackaging_docker-ssh-agent_PR-295\tests\sshAgent.Tests.ps1:128
    at <ScriptBlock>, C:\Jenkins\agent\workspace\ackaging_docker-ssh-agent_PR-295\tests\sshAgent.Tests.ps1:128

Additionally, there is an error from the nanoserver Docker image build unrelated to this PR, also on master branch: https://github.com/jenkinsci/docker-ssh-agent/issues/302

lemeurherve avatar Aug 12 '23 19:08 lemeurherve

Error log with `ssh -v`

OpenSSH_for_Windows_8.1p1, LibreSSL 2.9.2 debug1: Connecting to localhost [::1] port 50136. debug1: connect to address ::1 port 50136: Connection refused debug1: Connecting to localhost [127.0.0.1] port 50136. debug1: Connection established. debug1: identity file C:\Windows\TEMP\tmpEF2A.tmp type -1 debug1: identity file C:\Windows\TEMP\tmpEF2A.tmp-cert type -1 debug1: Local version string SSH-2.0-OpenSSH_for_Windows_8.1 debug1: Remote protocol version 2.0, remote software version OpenSSH_for_Windows_9.2 debug1: match: OpenSSH_for_Windows_9.2 pat OpenSSH* compat 0x04000000 debug1: Authenticating to localhost:50136 as 'jenkins' debug1: SSH2_MSG_KEXINIT sent debug1: SSH2_MSG_KEXINIT received debug1: kex: algorithm: curve25519-sha256 debug1: kex: host key algorithm: ecdsa-sha2-nistp256 debug1: kex: server->client cipher: [email protected] MAC: compression: none debug1: kex: client->server cipher: [email protected] MAC: compression: none debug1: expecting SSH2_MSG_KEX_ECDH_REPLY debug1: Server host key: ecdsa-sha2-nistp256 SHA256:4ScISJjuOHGDkz1DbQ0AtkvLCpml0NABvLxfXso7i/8 debug1: checking without port identifier Warning: Permanently added '[localhost]:50136' (ECDSA) to the list of known hosts. debug1: rekey out after 134217728 blocks debug1: SSH2_MSG_NEWKEYS sent debug1: expecting SSH2_MSG_NEWKEYS debug1: SSH2_MSG_NEWKEYS received debug1: rekey in after 134217728 blocks debug1: pubkey_prepare: ssh_get_authentication_socket: No such file or directory debug1: Will attempt key: C:\Windows\TEMP\tmpEF2A.tmp explicit debug1: SSH2_MSG_EXT_INFO received debug1: kex_input_ext_info: server-sig-algs=<ssh-ed25519,[email protected],ecdsa-sha2-nistp256,ecdsa-sha2-nistp384,ecdsa-sha2-nistp521,[email protected],[email protected],ssh-dss,ssh-rsa,rsa-sha2-256,rsa-sha2-512> debug1: kex_input_ext_info: [email protected] (unrecognised) debug1: SSH2_MSG_SERVICE_ACCEPT received debug1: Authentications that can continue: publickey debug1: Next authentication method: publickey debug1: Trying private key: C:\Windows\TEMP\tmpEF2A.tmp Load key "C:\Windows\TEMP\tmpEF2A.tmp": invalid format debug1: No more authentication methods to try. jenkins@localhost: Permission denied (publickey).

lemeurherve avatar Sep 15 '23 18:09 lemeurherve

https://ci.jenkins.io/job/Packaging/job/docker-ssh-agent/view/change-requests/job/PR-295/34/console:

Finished: SUCCESS

🎉

Now trying with the previous version of OpenSSH, then cleaning up to keep only fixes in this PR.

lemeurherve avatar Sep 15 '23 20:09 lemeurherve

This is really frustrating: running the build and tests locally on a Windows 10 machine with .\make.ps1 test works flawlessly, but SSH tests fail for Windows Server Core (and not Nanoserver) in ci.jenkins.io 🤔

https://ci.jenkins.io/job/Packaging/job/docker-ssh-agent/job/PR-295/58/console

Describing [jdk11-windowsservercore-ltsc2019] create agent container with pubkey as argument Starting Run-Program with cmd = docker.exe, params = inspect --format "{{.State.Running}}" pester-jenkins-ssh-agent-jdk11-windowsservercore-ltsc2019 Starting Run-ThruSSH with container = pester-jenkins-ssh-agent-jdk11-windowsservercore-ltsc2019, privateKeyVal.Length = 1674, cmd = powershell.exe -NoLogo -C "Write-Host 'f00'" Starting Run-Program with cmd = docker.exe, params = port pester-jenkins-ssh-agent-jdk11-windowsservercore-ltsc2019 22 Run-ThruSSH > Get-Port = 50167 Starting Run-Program with cmd = C:\Jenkins\agent\workspace\ackaging_docker-ssh-agent_PR-295\tests\ssh.exe, params = -v -i "C:\Windows\TEMP\tmpE026.tmp" -o LogLevel=verbose -o UserKnownHostsFile=NUL -o StrictHostKeyChecking=no -l jenkins localhost -p 50167 powershell.exe -NoLogo -C "Write-Host 'f00'"

stdout:

stderr: OpenSSH_for_Windows_8.1p1, LibreSSL 2.9.2 debug1: Connecting to localhost [::1] port 50167. debug1: connect to address ::1 port 50167: Connection refused debug1: Connecting to localhost [127.0.0.1] port 50167. debug1: connect to address 127.0.0.1 port 50167: Connection timed out ssh: connect to host localhost port 50167: Connection timed out

Run-ThruSSH > Run-Program > stdout = [-] runs commands via ssh 22.19s Expected 0, but got 255. 123: $exitCode | Should -Be 0 at <ScriptBlock>, C:\Jenkins\agent\workspace\ackaging_docker-ssh-agent_PR-295\tests\sshAgent.Tests.ps1: line 123

Describing [jdk11-windowsservercore-ltsc2019] create agent container with pubkey as envvar Starting Run-Program with cmd = docker.exe, params = inspect --format "{{.State.Running}}" pester-jenkins-ssh-agent-jdk11-windowsservercore-ltsc2019 Starting Run-ThruSSH with container = pester-jenkins-ssh-agent-jdk11-windowsservercore-ltsc2019, privateKeyVal.Length = 1674, cmd = powershell.exe -NoLogo -C "Write-Host 'f00'" Starting Run-Program with cmd = docker.exe, params = port pester-jenkins-ssh-agent-jdk11-windowsservercore-ltsc2019 22 Run-ThruSSH > Get-Port = 50172 Starting Run-Program with cmd = C:\Jenkins\agent\workspace\ackaging_docker-ssh-agent_PR-295\tests\ssh.exe, params = -v -i "C:\Windows\TEMP\tmp50F2.tmp" -o LogLevel=verbose -o UserKnownHostsFile=NUL -o StrictHostKeyChecking=no -l jenkins localhost -p 50172 powershell.exe -NoLogo -C "Write-Host 'f00'"

stdout:

stderr: OpenSSH_for_Windows_8.1p1, LibreSSL 2.9.2 debug1: Connecting to localhost [::1] port 50172. debug1: connect to address ::1 port 50172: Connection refused debug1: Connecting to localhost [127.0.0.1] port 50172. debug1: connect to address 127.0.0.1 port 50172: Connection timed out ssh: connect to host localhost port 50172: Connection timed out

Run-ThruSSH > Run-Program > stdout = [-] runs commands via ssh 22.11s Expected 0, but got 255. 140: $exitCode | Should -Be 0 at <ScriptBlock>, C:\Jenkins\agent\workspace\ackaging_docker-ssh-agent_PR-295\tests\sshAgent.Tests.ps1: line 140

Describing [jdk11-windowsservercore-ltsc2019] create agent container like docker-plugin with '/usr/sbin/sshd -D -p 22' as argument Starting Run-Program with cmd = docker.exe, params = inspect --format "{{.State.Running}}" pester-jenkins-ssh-agent-jdk11-windowsservercore-ltsc2019 Starting Run-ThruSSH with container = pester-jenkins-ssh-agent-jdk11-windowsservercore-ltsc2019, privateKeyVal.Length = 1674, cmd = powershell.exe -NoLogo -C "Write-Host 'f00'" Starting Run-Program with cmd = docker.exe, params = port pester-jenkins-ssh-agent-jdk11-windowsservercore-ltsc2019 22 Run-ThruSSH > Get-Port = 50178 Starting Run-Program with cmd = C:\Jenkins\agent\workspace\ackaging_docker-ssh-agent_PR-295\tests\ssh.exe, params = -v -i "C:\Windows\TEMP\tmpC1AE.tmp" -o LogLevel=verbose -o UserKnownHostsFile=NUL -o StrictHostKeyChecking=no -l jenkins localhost -p 50178 powershell.exe -NoLogo -C "Write-Host 'f00'"

stdout:

stderr: OpenSSH_for_Windows_8.1p1, LibreSSL 2.9.2 debug1: Connecting to localhost [::1] port 50178. debug1: connect to address ::1 port 50178: Connection refused debug1: Connecting to localhost [127.0.0.1] port 50178. debug1: connect to address 127.0.0.1 port 50178: Connection timed out ssh: connect to host localhost port 50178: Connection timed out

Run-ThruSSH > Run-Program > stdout = [-] runs commands via ssh 22.1s Expected 0, but got 255. 160: $exitCode | Should -Be 0 at <ScriptBlock>, C:\Jenkins\agent\workspace\ackaging_docker-ssh-agent_PR-295\tests\sshAgent.Tests.ps1: line 160

Any idea @jenkinsci/team-docker-packaging?

lemeurherve avatar Sep 27 '23 20:09 lemeurherve

This is really frustrating: running the build and tests locally on a Windows 10 machine with .\make.ps1 test works flawlessly, but SSH tests fail for Windows Server Core (and not Nanoserver) in ci.jenkins.io

Create a VM based on the image in the image gallery, that's how I have always debugged failures in jenkinsci/docker.

timja avatar Sep 28 '23 07:09 timja

This is really frustrating: running the build and tests locally on a Windows 10 machine with .\make.ps1 test works flawlessly, but SSH tests fail for Windows Server Core (and not Nanoserver) in ci.jenkins.io

Create a VM based on the image in the image gallery, that's how I have always debugged failures in jenkinsci/docker.

+1 with Tim: your Windows 10/11 with Docker Desktop uses a different isolation for containers that a fully fledged Win 2019 / 2022 server with Docker-CE windows containers (not the same kernel, not the same hypervisor and system APIs).

dduportal avatar Sep 28 '23 08:09 dduportal

@timja @dduportal I've spawned a VM using this image : prod-packer-images/providers/Microsoft.Compute/images/jenkins-agent-windows-2019

And... All tests passed, including those with Windows Server Core 🎉 😅 🤔

What could I try now?

Note that I've tried a replay with windows-2019 as agent label, still failing on ci.jenkins.io with Windows Server Core

lemeurherve avatar Sep 28 '23 20:09 lemeurherve

The last build seemed to pass?

timja avatar Sep 28 '23 20:09 timja

The last build seemed to pass?

Many of them are green, but https://github.com/jenkinsci/docker-ssh-agent/issues/291

lemeurherve avatar Sep 28 '23 20:09 lemeurherve

About the green builds even with some tests failing:

The good news is that it fixes https://github.com/jenkinsci/docker-ssh-agent/issues/302, the error message is gone 🎉 (cf https://ci.jenkins.io/job/Packaging/job/docker-ssh-agent/job/PR-319/2/console)

The (less?) good news is that it also fixes https://github.com/jenkinsci/docker-ssh-agent/issues/291, the tests are now failing the build as expected 😅

From https://github.com/jenkinsci/docker-ssh-agent/pull/319#issuecomment-1721560438

I can put #319 in "ready for review" so it can be merged already, but I don't really know why it restored the ability of failing the build. I've already tried integrating this OpenSSH update in this PR (commit https://github.com/jenkinsci/docker-ssh-agent/pull/295/commits/fa20ba11e2f4eb0bab62a25d512f7f53c13b0035) but the corresponding build was green while (SSH) tests were failing on Windows Server Core.

lemeurherve avatar Sep 28 '23 21:09 lemeurherve

Should be started over now that all tests pass with nanoserver images and that only the SSH tests are failing for the Windows Server Core images.

lemeurherve avatar Mar 27 '24 17:03 lemeurherve