ec2-plugin icon indicating copy to clipboard operation
ec2-plugin copied to clipboard

[JENKINS-70093] Add Windows SSH agent implementation

Open rdysart opened this issue 9 months ago • 8 comments

Adding support for launching Windows agents using SSH https://issues.jenkins.io/browse/JENKINS-70093

Testing done

mvn clean verify

Tested on Jenkins controller 2.500 launching and connecting to Windows Server 2019 agent via SSH. Tested with init script

Mar 11, 2025 11:17:29 AM hudson.plugins.ec2.EC2Cloud
INFO: Launching instance: i-XXXXXXXXXXXXXXXXX
Mar 11, 2025 11:17:29 AM hudson.plugins.ec2.EC2Cloud
INFO: bootstrap()
Mar 11, 2025 11:17:29 AM hudson.plugins.ec2.EC2Cloud
INFO: Getting keypair...
Mar 11, 2025 11:17:29 AM hudson.plugins.ec2.EC2Cloud
INFO: Using private key XXXXX (SHA-1 fingerprint XX:XX:XX:XX:XX:XX:XX:XX:XX:XX:XX:XX:XX:XX:XX:XX:XX:XX:XX:XX)
Mar 11, 2025 11:17:29 AM hudson.plugins.ec2.EC2Cloud
INFO: Authenticating as Administrator
Mar 11, 2025 11:17:29 AM hudson.plugins.ec2.EC2Cloud
INFO: Connecting to X.X.X.X on port 22, with timeout 10000.
Mar 11, 2025 11:18:29 AM hudson.plugins.ec2.EC2Cloud
INFO: Failed to connect via ssh: DefaultConnectFuture[Administrator@/X.X.X.X:22]: Failed (ConnectException) to execute: Connection timed out.
Mar 11, 2025 11:18:29 AM hudson.plugins.ec2.EC2Cloud
INFO: Waiting for SSH to come up. Sleeping 5.
Mar 11, 2025 11:18:34 AM hudson.plugins.ec2.EC2Cloud
INFO: Connecting to X.X.X.X on port 22, with timeout 10000.
Mar 11, 2025 11:18:50 AM hudson.plugins.ec2.EC2Cloud
INFO: Connected via SSH.
Mar 11, 2025 11:18:51 AM hudson.plugins.ec2.EC2Cloud
INFO: The instance EC2 (XXXXXXXX) - win (i-XXXXXXXXXXXXXXXXX) didn't print the host key. Expected a line starting with: "ecdsa-sha2-nistp256"
Mar 11, 2025 11:18:51 AM hudson.plugins.ec2.EC2Cloud
INFO: The SSH key (ecdsa-sha2-nistp256 XX:XX:XX:XX:XX:XX:XX:XX:XX:XX:XX:XX:XX:XX:XX:XX) presented by the instance has not been found on the instance console. Cannot check the key but the connection to EC2 (XXXXXXXX) - win (i-XXXXXXXXXXXXXXXXX) is allowed
Mar 11, 2025 11:18:51 AM hudson.plugins.ec2.EC2Cloud
INFO: connect fresh as Administrator
Mar 11, 2025 11:18:51 AM hudson.plugins.ec2.EC2Cloud
INFO: Connecting to X.X.X.X on port 22, with timeout 10000.
Mar 11, 2025 11:18:51 AM hudson.plugins.ec2.EC2Cloud
INFO: Connected via SSH.
Mar 11, 2025 11:18:51 AM hudson.plugins.ec2.EC2Cloud
INFO: Connection allowed after the host key has been verified
Mar 11, 2025 11:18:51 AM hudson.plugins.ec2.EC2Cloud
INFO: Creating tmp directory (C:\Windows\Temp\) if it does not exist
Mar 11, 2025 11:18:53 AM hudson.plugins.ec2.EC2Cloud
INFO: Upload init script
Mar 11, 2025 11:18:54 AM hudson.plugins.ec2.EC2Cloud
INFO: Executing init script

administrator@XXXXXXX-XXXXXXX C:\Users\Administrator>ECHO INIT SCRIPT HELLO WORLD
INIT SCRIPT HELLO WORLD
Mar 11, 2025 11:18:54 AM hudson.plugins.ec2.EC2Cloud
INFO: Creating %USERPROFILE%\.hudson-run-init
        1 file(s) copied.
Mar 11, 2025 11:18:54 AM hudson.plugins.ec2.EC2Cloud
INFO: Copying remoting.jar to: C:\Windows\Temp\
Mar 11, 2025 11:18:55 AM hudson.plugins.ec2.EC2Cloud
INFO: Launching remoting agent (via SSH2 Connection):  java  -jar C:\Windows\Temp\remoting.jar -workDir C:\jenkins
Mar 11, 2025 11:18:55 AM hudson.plugins.ec2.EC2Cloud
INFO: Connecting to X.X.X.X on port 22, with timeout 10000.
Mar 11, 2025 11:18:55 AM hudson.plugins.ec2.EC2Cloud
INFO: Connected via SSH.
Mar 11, 2025 11:18:55 AM hudson.plugins.ec2.EC2Cloud
INFO: Connection allowed after the host key has been verified
<===[JENKINS REMOTING CAPACITY]===>Remoting version: 3283.v92c105e0f819
Launcher: EC2WindowsSSHLauncher
Communication Protocol: Standard in/out
This is a Windows agent
Agent successfully connected and online

Submitter checklist

  • [x] Make sure you are opening from a topic/feature/bugfix branch (right side) and not your main branch!
  • [x] Ensure that the pull request title represents the desired changelog entry
  • [x] Please describe what you did
  • [x] Link to relevant issues in GitHub or Jira
  • [ ] Link to relevant pull requests, esp. upstream and downstream changes
  • [x] Ensure you have provided tests - that demonstrates feature works or fixes the issue

rdysart avatar Mar 11 '25 15:03 rdysart

Should be linked to https://issues.jenkins.io/browse/JENKINS-70093.

Dohbedoh avatar Mar 28 '25 06:03 Dohbedoh

Hi @Dohbedoh is there any update on reviewing this PR? This would be really helpful for our case as well. Since we are also using winrm on Jenkins and it takes 15min per provision. Thanks.

peterzhuamazon avatar Jun 19 '25 21:06 peterzhuamazon

LGTM. Will need the maintainers to double check though ping @jenkinsci/ec2-plugin-developers

Dohbedoh avatar Jun 19 '25 23:06 Dohbedoh

Thanks @Dohbedoh .

Hi @res0nance @fcojfernandez could you help take a look?

Thanks.

peterzhuamazon avatar Jun 20 '25 14:06 peterzhuamazon

Has this been tested with the older stuff e.g linux ssh to ensure the refactoring didn't break the existing functionality?

res0nance avatar Jun 22 '25 02:06 res0nance

Has this been tested with the older stuff e.g linux ssh to ensure the refactoring didn't break the existing functionality?

Yes, I have been using it with both windows and linux ssh agents. I also tested that WinRM agents still worked. Haven't tested Mac agents

rdysart avatar Jun 22 '25 15:06 rdysart

Has this been tested with the older stuff e.g linux ssh to ensure the refactoring didn't break the existing functionality?

Yes, I have been using it with both windows and linux ssh agents. I also tested that WinRM agents still worked. Haven't tested Mac agents

Is it possible to test it on mac agents as well?

res0nance avatar Jun 23 '25 10:06 res0nance

Has this been tested with the older stuff e.g linux ssh to ensure the refactoring didn't break the existing functionality?

Yes, I have been using it with both windows and linux ssh agents. I also tested that WinRM agents still worked. Haven't tested Mac agents

Is it possible to test it on mac agents as well?

Testing Mac would required reserving a dedicated Mac host for a minimum of 24 hours which I'm not willing to do. I can revert the changes to EC2MacLauncher/EC2UnixLauncher if you'd like but I can't test that Mac agents still work.

rdysart avatar Jun 24 '25 17:06 rdysart

Thanks all for making this happening!

peterzhuamazon avatar Jul 07 '25 16:07 peterzhuamazon

Hi @rdysart , thanks for this contribution!

On the official Jenkins infrastructure (ci.jenkins.io), we are interested into using it. So far, our tests are not working: we see the SSH connection being established, but the VM is reclaimed before the agent process starts. We are currently investigating but I wonder if you could share with us your configuration:

  • EC2 cloud setup / JCasC (anonymized of course)
  • Which AMI are you using? an official or a custom made?

We are particularly impressed by the timings in your logs (in the PR body): we currently have 3 to 5 min for Windows agent availability with prebaked SSH keys and EC2 Fast Launch enabled. But if we could get close to your startup performances that would be great (once we make it work first ;) )

dduportal avatar Jul 18 '25 17:07 dduportal

Hi @rdysart , thanks for this contribution!

On the official Jenkins infrastructure (ci.jenkins.io), we are interested into using it. So far, our tests are not working: we see the SSH connection being established, but the VM is reclaimed before the agent process starts. We are currently investigating but I wonder if you could share with us your configuration:

* EC2 cloud setup / JCasC (anonymized of course)

* Which AMI are you using? an official or a custom made?

We are particularly impressed by the timings in your logs (in the PR body): we currently have 3 to 5 min for Windows agent availability with prebaked SSH keys and EC2 Fast Launch enabled. But if we could get close to your startup performances that would be great (once we make it work first ;) )

I've found that certain versions of OpenSSH do not work and have opened a PR to address this. PR-1118

I'm using a custom windows AMI based on 2019 Base image. Below is the script I use when configuring:

Invoke-WebRequest -Uri 'https://github.com/adoptium/temurin21-binaries/releases/download/jdk-21.0.7%2B6/OpenJDK21U-jre_x64_windows_hotspot_21.0.7_6.zip' -OutFile 'C:\Windows\Temp\jre.zip'
Expand-Archive -Path 'C:\Windows\Temp\jre.zip' -DestinationPath 'C:\Program Files'
$localPath = [Environment]::GetEnvironmentVariable('PATH', [EnvironmentVariableTarget]::Machine)
[Environment]::SetEnvironmentVariable("JAVA_HOME", "C:\Program Files\jdk-21.0.7+6-jre", [EnvironmentVariableTarget]::Machine)
[Environment]::SetEnvironmentVariable("PATH", "$localPath;C:\Program Files\jdk-21.0.7+6-jre\bin", [EnvironmentVariableTarget]::Machine)

Invoke-WebRequest -Uri 'https://github.com/PowerShell/Win32-OpenSSH/releases/download/v8.9.1.0p1-Beta/OpenSSH-Win64.zip' -OutFile 'C:\Windows\Temp\openssh.zip'
Expand-Archive -Path 'C:\Windows\Temp\openssh.zip' -DestinationPath 'C:\Program Files'
$localPath = [Environment]::GetEnvironmentVariable('PATH', [EnvironmentVariableTarget]::Machine)
$localPath = $localPath -replace 'C:\\Windows\\System32\\OpenSSH\\;', ''
[Environment]::SetEnvironmentVariable("PATH", "$localPath", [EnvironmentVariableTarget]::Machine)
& 'C:\Program Files\OpenSSH-Win64\uninstall-sshd.ps1'
& 'C:\Program Files\OpenSSH-Win64\install-sshd.ps1'
$localPath = [Environment]::GetEnvironmentVariable('PATH', [EnvironmentVariableTarget]::Machine)
[Environment]::SetEnvironmentVariable("PATH", "$localPath;C:\Program Files\OpenSSH-Win64", [EnvironmentVariableTarget]::Machine)
Set-Service -Name 'sshd' -StartupType Automatic

New-NetFirewallRule -Name 'OpenSSH-Server-In-TCP' -DisplayName 'OpenSSH Server (sshd)' -Enabled True -Direction Inbound -Protocol TCP -Action Allow -LocalPort 22

New-Item -Path 'C:\ProgramData\ssh\administrators_authorized_keys' -ItemType File
Add-Content -Path 'C:\ProgramData\ssh\administrators_authorized_keys' -Value '<SSH PUBLIC KEY>'

JCasC:

jenkins:
  clouds:
  - amazonEC2:
      name: "ec2"
      region: "us-east-1"
      sshKeysCredentialsId: "KEY"
      templates:
      - ami: "ami-00000000000000000"
        amiOwners: "123456789012"
        amiType:
          windowsSSHData:
            sshPort: "22"
        associatePublicIp: true
        connectBySSHProcess: false
        connectionStrategy: PRIVATE_IP
        deleteRootOnTermination: false
        description: "win"
        ebsEncryptRootVolume: DEFAULT
        ebsOptimized: false
        enclaveEnabled: false
        hostKeyVerificationStrategy: CHECK_NEW_SOFT
        idleTerminationMinutes: "5"
        javaPath: "java"
        labelString: "win"
        maxTotalUses: -1
        metadataEndpointEnabled: true
        metadataHopsLimit: 1
        metadataSupported: true
        metadataTokensRequired: false
        minimumNumberOfInstances: 0
        minimumNumberOfSpareInstances: 0
        mode: EXCLUSIVE
        monitoring: false
        numExecutors: 1
        remoteAdmin: "Administrator"
        remoteFS: "C:\\jenkins"
        securityGroups: "sg-00000000000000000"
        stopOnTerminate: false
        subnetId: "subnet-00000000000000000"
        t2Unlimited: false
        tenancy: Default
        type: "t3.medium"
        useEphemeralDevices: false
      useInstanceProfileForCredentials: false

rdysart avatar Jul 21 '25 01:07 rdysart