docker-plugin icon indicating copy to clipboard operation
docker-plugin copied to clipboard

[Feature request] Windows container support

Open BlueAndi opened this issue 5 years ago • 44 comments

Would be great to have windows container support as well.

BlueAndi avatar Aug 14 '19 13:08 BlueAndi

What exactly is / isn't supported on the Windows side for this plugin?

We currently have ephemeral Jenkins build agent on the Linux side via this plugin and need to set one up for Windows as well.

bverkron avatar Sep 30 '19 22:09 bverkron

That's a good question, and I don't know the answer as I don't use Windows containers myself (and nobody who does has set out exactly what the issues are).

FYI, internally, the plugin doesn't care what OS you're using. Internally, it's all Java (as is Jenkins as a whole) and it's talking to the docker daemon(s) via a Java library (docker-java) so if Microsoft's implementation of docker is a compliant implementation of docker (rather than something that is not docker but that Microsoft call docker, which can happen when corporations believe there's "no standard they can't improve on" :angry: ) then it should "just work" ... ... but I presume there must be at least one reason why it doesn't "just work" otherwise folks wouldn't be raising this kind of issue.

If anyone's willing to investigate and implement this (see CONTRIBUTING guidelines) then I'd be happy to review the code and, ultimately, merge things in.

pjdarton avatar Oct 01 '19 11:10 pjdarton

Sounds like my assumption that it should work (but might have some quirks / problems) was correct.

We're attempting to use docker containers in our declarative pipelines but it's falling down in an odd place on the Windows slaves whereas it worked on the Linux side with the identical setup.

It's not clear to me if this is caused by the way the plugin behaves with Windows slaves or if it's some different behavior in the Windows implementation of Docker.

Pipeline script Same syntax as working Linux jobs except for image and label tags.

pipeline {
    agent {
        docker {
            image 'iis'
            label 'dockerEnabledWindows'
        }
    }
    stages {
        stage('Example Build') {
            steps {
                sh 'hostname'
            }
        }
    }
}

Build Console Log

Running on Windows in C:/jenkins/workspace/Ephemeral Build Agent PoC/Ephemeral Build Agent PoC v6 Docker enabled Windows slave
[Pipeline] {
[Pipeline] withEnv
[Pipeline] {
[Pipeline] withDockerRegistry
Using the existing docker config file.Removing blacklisted property: auths$ docker login -u ****** -p ******** https://index.docker.io/v1/
WARNING! Using --password via the CLI is insecure. Use --password-stdin.
Error response from daemon: Get https://registry-1.docker.io/v2/: unauthorized: incorrect username or password
[Pipeline] // withDockerRegistry
[Pipeline] }
[Pipeline] // withEnv
[Pipeline] }
[Pipeline] // node
[Pipeline] End of Pipeline
ERROR: docker login failed
Finished: FAILURE

Below is the first error we got so I setup an account with Docker Hub and ran the docker login command to cache the login, hoping that would work. But it just produced a variation of the save error (above).

$ docker login -u ******* -p ******** https://index.docker.io/v1/
WARNING! Using --password via the CLI is insecure. Use --password-stdin.
Error response from daemon: Get https://registry-1.docker.io/v2/: unauthorized: incorrect username or password

The docker login is the odd piece to me as we're not making any explicit attempts to connect to a registry and definitely not a private one thus IMO there should be not need for the login command nor credentials. On the Linux side it happily connects to Docker Hub without the login command as far as I can tell and we've never had to do anything with credentials.

Trying to determine where the problem might lie so I know whether to dig down the Jenkins Docker plugin path or the Windows Docker configuration path

bverkron avatar Oct 01 '19 17:10 bverkron

Ah, withDockerRegistry suggests that you're using the docker-workflow-plugin not the docker-plugin. Different plugin, different way of working, different GitHub repo ... and not a plugin I know much about, aside from everyone confusing it for this one.

pjdarton avatar Oct 01 '19 19:10 pjdarton

What would cause Jenkins to use the docker-workflow-plugin instead of docker-plugin like the other job? They are configured the same way (literally just the declarative pipeline script above) and just pointing at a Linux vs Windows node / image

This is what the Linux node job runs...

[Pipeline] {
[Pipeline] sh
+ docker inspect -f . maven
.
[Pipeline] withDockerContainer
Linux does not seem to be running inside a container
$ docker run -t -d -u 164263:164263 -w "/home/_jenkinsauto/remote/workspace/Ephemeral Build Agent PoC/Ephemeral Build Agent PoC v6 Docker enabled slave@2" -v "/home/_jenkinsauto/remote/workspace/Ephemeral Build Agent PoC/Ephemeral Build Agent PoC v6 Docker enabled slave@2:/home/_jenkinsauto/remote/workspace/Ephemeral Build Agent PoC/Ephemeral Build Agent PoC v6 Docker enabled slave@2:rw,z" -v "/home/_jenkinsauto/remote/workspace/Ephemeral Build Agent PoC/Ephemeral Build Agent PoC v6 Docker enabled slave@2@tmp:/home/_jenkinsauto/remote/workspace/Ephemeral Build Agent PoC/Ephemeral Build Agent PoC v6 Docker enabled slave@2@tmp:rw,z" -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** maven cat
$ docker top 489b5b3d2dc7327c787cfdf8044a945b9735737c690c3162abdb45e051138e71 -eo pid,comm
[Pipeline] {
[Pipeline] stage
[Pipeline] { (Example Build)
[Pipeline] sh
...

Edit: Re-running the linux job now returns the same docker login error as the Windows job. AFAIK nothing in the Jenkins master config has changed. Where / how do we specify which plugin to use? Uninstall of the docker-workflow-plugin (i.e. Docker Pipeline) doesn't seem to be an option as it's inactive under pluginManager/installed and I can't interact with it.

bverkron avatar Oct 01 '19 20:10 bverkron

My test setup was a pc with windows 10, docker desktop ce installed and the docker daemon running, as well a registry on it. It provides several agents, based on windows docker images. The jenkins master runs on a different pc.

I configured a "windows" cloud in the jenkins master (provided by the docker plugin) with a test agent.

Calling this test agent now in a job, will result that a corresponding docker container is created, but then the access to it fails.

Tomorrow I can provide the error logs and more information about the configuration.

BlueAndi avatar Oct 01 '19 20:10 BlueAndi

Edit: Re-running the linux job now returns the same docker login error as the Windows job. AFAIK nothing in the Jenkins master config has changed. Where / how do we specify which plugin to use?

Found the solution to the docker login problem. Under Manage Jenkins > Configure System > Pipeline Model Definition a value had been selected under Registry credentials. Since the other two fields were blank it was simply trying to force a login for the public Docker Hub with the selected credentials and being a global settings it was effecting all docker related jobs.

This also indicates that the pipeline scripts were not actually using the docker-workflow-plugin (aka Docker Pipeline) plugin as previously suggested.

Now the problem becomes the docker-plugin seemingly trying to treat the Windows host as a Linux host and trying to execute the nohup command.

Running on Windows in C:/jenkins/workspace/Ephemeral Build Agent PoC/Ephemeral Build Agent PoC v6 Docker enabled slave Windows
[Pipeline] {
[Pipeline] sh
[Pipeline] }
[Pipeline] // node
[Pipeline] End of Pipeline
...
Caused: java.io.IOException: Cannot run program "nohup" (in directory "C:\jenkins\workspace\Ephemeral Build Agent PoC\Ephemeral Build Agent PoC v6 Docker enabled slave Windows"): CreateProcess error=2, The system cannot find the file specified

Which could be a configuration thing but also seems that docker-plugin is trying to use the nohup command when doing the docker steps which of course isn't available on Windows by default.

I think this can be solved via a process like this (i.e. installing & configuring git-bash or similar) but is this the expected / correct setup for docker on Windows slave hosts? https://stackoverflow.com/a/45151156

Will poke around but any guidance would be appreciated.

bverkron avatar Oct 01 '19 22:10 bverkron

Disclaimer: on mobile, from home, going from memory and not looking stuff up as it's my bed time...

Take a look at the "advanced" connection properties, e.g. jnlp or direct or SSH. In there you may find the ability to override default "start slave" commands. The online help may even tell you what the defaults are. That'll be in manage Jenkins -> configure system -> scroll down to "clouds" and look in the templates you've defined ... if you are using this plug-in to provide your executors and not the docker-workflow-plugin, that is ;-)

pjdarton avatar Oct 01 '19 23:10 pjdarton

I will take a look. Away from the office ATM so will be tomorrow. In the meantime...

How do we differentiate between which plugin is being used by the commands? I believe the pipeline script is being used based on experimentation, syntax, and discussion in other threads for docker-plugin, but how to I confirm?

We don't have any clouds defined as, right now, we're dealing with specifics Jenkins slaves w/Docker installed so we can define the images in the declarative pipeline script itself (and thus in SCM) and/or Dockerfile files rather than the Jenkins UI. In our case I think the relevant connection settings would be under the slave itself under the nodes section of Jenkins master.

bverkron avatar Oct 02 '19 00:10 bverkron

Using this plugin and not the docker-workflow plugin ;-), I get the following result:

Asked to provision 1 slave(s) for: win-agent
Oct 02, 2019 8:45:02 AM INFO io.jenkins.docker.client.DockerAPI$1 entryDroppedFromCache
Dropped connection io.jenkins.docker.client.DockerAPI$SharableDockerClient@1fe310cf to DockerClientParameters{dockerUri=tcp://lp13007:2375, credentialsId=null, readTimeoutInMsOrNull=300000, connectTimeoutInMsOrNull=60000}
Oct 02, 2019 8:45:02 AM INFO com.nirima.jenkins.plugins.docker.DockerCloud canAddProvisionedSlave
Provisioning 'localhost:5000/win-agent' number 1 (of 1) on 'windows cloud agents'; Total containers: 0 (of 4)
Oct 02, 2019 8:45:02 AM INFO com.nirima.jenkins.plugins.docker.DockerCloud provision
Will provision 'localhost:5000/win-agent', for label: 'win-agent', in cloud: 'windows cloud agents'
Oct 02, 2019 8:45:02 AM INFO hudson.slaves.NodeProvisioner$StandardStrategyImpl apply
Started provisioning Image of localhost:5000/win-agent from windows cloud agents with 1 executors. Remaining excess workload: 0
Oct 02, 2019 8:45:02 AM INFO com.nirima.jenkins.plugins.docker.DockerTemplate pullImage
Pulling image 'localhost:5000/win-agent:latest'. This may take awhile...
Oct 02, 2019 8:45:02 AM INFO io.jenkins.docker.client.DockerAPI getOrMakeClient
Cached connection io.jenkins.docker.client.DockerAPI$SharableDockerClient@5ea08dda to DockerClientParameters{dockerUri=tcp://lp13007:2375, credentialsId=null, readTimeoutInMsOrNull=300000, connectTimeoutInMsOrNull=60000}
Oct 02, 2019 8:45:03 AM INFO com.nirima.jenkins.plugins.docker.DockerTemplate pullImage
Finished pulling image 'localhost:5000/win-agent:latest', took 994 ms
Oct 02, 2019 8:45:03 AM INFO com.nirima.jenkins.plugins.docker.DockerTemplate doProvisionNode
Trying to run container for localhost:5000/win-agent
Oct 02, 2019 8:45:03 AM INFO com.nirima.jenkins.plugins.docker.DockerTemplate doProvisionNode
Trying to run container for node win-agent-0001r6amcl1rs from image: localhost:5000/win-agent
Oct 02, 2019 8:45:03 AM INFO com.nirima.jenkins.plugins.docker.DockerTemplate doProvisionNode
Started container ID 6e1c47d50b63636f580761fbe2b367ba53c902a07ad648eba7edd5c88c826d04 for node win-agent-0001r6amcl1rs from image: localhost:5000/win-agent
Oct 02, 2019 8:45:03 AM SEVERE com.github.dockerjava.core.async.ResultCallbackTemplate onError
Error during callback
com.github.dockerjava.api.exception.NotFoundException: {"message":"Could not find the file /root in container 6e1c47d50b63636f580761fbe2b367ba53c902a07ad648eba7edd5c88c826d04"}

	at com.github.dockerjava.netty.handler.HttpResponseHandler.channelRead0(HttpResponseHandler.java:103)
	at com.github.dockerjava.netty.handler.HttpResponseHandler.channelRead0(HttpResponseHandler.java:33)
	at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
	at io.netty.channel.ChannelInboundHandlerAdapter.channelRead(ChannelInboundHandlerAdapter.java:86)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
	at io.netty.handler.logging.LoggingHandler.channelRead(LoggingHandler.java:241)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
	at io.netty.channel.CombinedChannelDuplexHandler$DelegatingChannelHandlerContext.fireChannelRead(CombinedChannelDuplexHandler.java:438)
	at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:310)
	at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:284)
	at io.netty.channel.CombinedChannelDuplexHandler.channelRead(CombinedChannelDuplexHandler.java:253)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
	at io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:287)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
	at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1334)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
	at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:926)
	at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:134)
	at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:644)
	at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:579)
	at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:496)
	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:458)
	at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858)
	at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:138)
	at java.lang.Thread.run(Thread.java:748)

Oct 02, 2019 8:45:03 AM SEVERE com.nirima.jenkins.plugins.docker.DockerCloud$1 run
Error in provisioning; template='DockerTemplate{configVersion=2, labelString='win-agent', connector=io.jenkins.docker.connector.DockerComputerSSHConnector@5f068bde, remoteFs='C:\Users\jenkins', instanceCap=1, mode=NORMAL, retentionStrategy=com.nirima.jenkins.plugins.docker.strategy.DockerOnceRetentionStrategy@4bcce0a5, dockerTemplateBase=DockerTemplateBase{image='localhost:5000/win-agent', pullCredentialsId='', registry=DockerRegistryEndpoint[null;credentialsId=null], dockerCommand='', hostname='', dnsHosts=[], network='', volumes=[], volumesFrom2=[], environment=[], bindPorts='', bindAllPorts=false, memoryLimit=null, memorySwap=null, cpuShares=null, shmSize=null, privileged=false, tty=false, macAddress='null', extraHosts=[]}, removeVolumes=false, pullStrategy=PULL_ALWAYS, nodeProperties=[], disabled=BySystem,0 ms,4 min 59 sec,Template provisioning failed.}' for cloud='windows cloud agents'
com.github.dockerjava.api.exception.NotFoundException: {"message":"Could not find the file /root in container 6e1c47d50b63636f580761fbe2b367ba53c902a07ad648eba7edd5c88c826d04"}

	at com.github.dockerjava.netty.handler.HttpResponseHandler.channelRead0(HttpResponseHandler.java:103)
	at com.github.dockerjava.netty.handler.HttpResponseHandler.channelRead0(HttpResponseHandler.java:33)
	at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
	at io.netty.channel.ChannelInboundHandlerAdapter.channelRead(ChannelInboundHandlerAdapter.java:86)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
	at io.netty.handler.logging.LoggingHandler.channelRead(LoggingHandler.java:241)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
	at io.netty.channel.CombinedChannelDuplexHandler$DelegatingChannelHandlerContext.fireChannelRead(CombinedChannelDuplexHandler.java:438)
	at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:310)
	at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:284)
	at io.netty.channel.CombinedChannelDuplexHandler.channelRead(CombinedChannelDuplexHandler.java:253)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
	at io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:287)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
	at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1334)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
	at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:926)
	at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:134)
	at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:644)
	at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:579)
	at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:496)
	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:458)
	at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858)
	at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:138)
	at java.lang.Thread.run(Thread.java:748)

According to the message Could not find the file /root in container it looks like the assumption is still a linux container.

BlueAndi avatar Oct 02 '19 08:10 BlueAndi

What would cause Jenkins to use the docker-workflow-plugin instead of docker-plugin

It all comes down to what pipeline syntax you use. Different plugins provide different functionality with different words. The docker-plugin provides very little pipeline functionality, just the dockerNode keyword. The docker-workflow-plugin is what most people are using when they're doing pipelines with docker as that's what's documented in the Jenkins documentation - that's the plugin that provides pipeline keywords like withDockerRegistry or docker.image etc. One other indication is that if your logs mention any docker command-line stuff then that's the docker-workflow-plugin - the docker-plugin doesn't use/need a docker command-line client as it uses a Java docker client to talk to docker daemons.

If you've defined some clouds and templates in Manage Jenkins -> Configure System -> docker clouds and then defined a pipeline to use a slave whose label matches one of the templates you've defined then your builds would be running on docker containers that are created by the docker-plugin. FYI that's the docker-plugin's primary use case.

...and, yes, I know it's confusing, which is why the README, ISSUE_TEMPLATE and CONTRIBUTING docs in this plugin all mention this issue and tell folks to be sure of what they're using so they can report things in the right place, because it confuses the hell out of everyone and makes it all too easy to make mistakes.

docker-plugin is trying to use the nohup command

FYI the docker-plugin knows nothing of the nohup command; the word "nohup" is not in its code (it's not in the docker-workflow-plugin's code either). However, nohup what the Jenkins durable-task-plugin's step will add when it's not on a Darwin (mac) OS (for Windows, it assumes you're using Cygwin and will have nohup).

I'd suggest that, when doing Windows pipelines, you use something like the bat pipeline command. Or, alternatively, use the echo pipeline command as that's platform agnostic, or run a groovy command to list all the environment variables etc.

Could not find the file /root in container

Aha! Yes, now we're getting somewhere :grin: These logs did come from the docker-plugin. I've checked the plugin's code and, sure enough, /root/ is in the code - if your template is defined to use the SSH connector and you're using an injected key then it'll start the container with the command /usr/sbin/sshd -D -p <port> -o AuthorizedKeysCommand=/root/authorized_key -o AuthorizedKeysCommandUser=root (see DockerComputerSSHConnector.java line 180) ... and would also try to run a /bin/sh script in the container to inject the key when the container starts too, so that's not going to work on Windows. However, even if you don't inject a key, the SSH connector would still tell the container to start with the command /usr/sbin/sshd -D -p <port>, and that's unlikely to work on Windows either.

I think you need to get your Windows containers to use the JNLP or Attach connection method instead. It looks like the SSH method has a lot of hard-coded unixy stuff in it at present. ...and, to be frank, I wouldn't recommend relying on SSH on Windows - an SSH environment in Windows is tricky to make work and be secure; you usually end up having to choose between "secure" and "lets you do everything your build needs to do".

It looks like the Attach method runs the following command on the container: java -jar <remoteFs>/slave.jar -noReconnect -noKeepAlive -slaveLog <remoteFs>/agent.log So, as long as you ensure that java in on the %PATH% and that you're setting the template's remoteFs correctly and that slave.jar is already present there then it's likely to "just work".

The JNLP method provides more customisation capabilities (hidden in its "advanced" bits) so you can specify exactly what command the container should run, so if Attach doesn't work then JNLP can be forced to work.

pjdarton avatar Oct 02 '19 10:10 pjdarton

The docker-plugin provides very little pipeline functionality, just the dockerNode keyword.

This is the critical differentiator and sadly this wasn't mentioned anywhere in the extensive reading I've done on these plugins.

...and, yes, I know it's confusing, which is why the README, ISSUE_TEMPLATE and CONTRIBUTING docs in this plugin all mention this issue and tell folks to be sure of what they're using so they can report things in the right place, because it confuses the hell out of everyone and makes it all too easy to make mistakes.

I have read those and do apologize for letting this thread slip from a clarification request for the original FR into a Q&A. I did try the mailing list and even Reddit and it's mostly dead air out there for these kinds of questions. Zero responses elsewhere, sadly. Additionally even in those links there isn't anything that clearly states how to tell the difference at the top of the layer of the syntax (so to speak). We are using just docker {} and dockerfile {} syntax without directly specifying any of the underlying calls like with withDockerRegistry so until that started bubbling up in some of these Windows jobs it was not clear to us that the docker-workflow (aka pipeline) plugin was actually being used and there seemed to be evidence to the contrary. Furthermore, even in deep conversations in this thread despite detailed examples of what we were trying to do (which I would have thought would have made it obvious which plugin was actually being used) nothing was said about the difference between docker-plugin and docker-workflow-plugin in our context.

Since we want to "codify" everything directly in the declarative pipeline scripts and/or Dockerfile files (via SCM) it seems we're restricted to using the docker pipeline plugin. It's unclear to me if this docker-plugin may be able to suite our needs via dockerNode in declarative pipeline if/when PR 681 is eventually released. Hopefully one day things will merged / deprecated / documented as necessary to make this all more clear.

I'd suggest that, when doing Windows pipelines, you use something like the bat pipeline command. Or, alternatively, use the echo pipeline command as that's platform agnostic, or run a groovy command to list all the environment variables etc.

Attempted to use both echo and bat but the nohup failure is occurring before those lines are even reached. It's failing at the first step of creating the container as far as I can tell from the log. I will investigate the cygwin approach and try to pursue this elsewhere in the context of the docker pipeline plugin.

Greatly appreciate you taking the time to respond here and apologies again for derailing this thread.

bverkron avatar Oct 02 '19 17:10 bverkron

sadly this wasn't mentioned anywhere

Yes; sadly a lack of docs is not an uncommon issue with free software - the folks who write code to make it do things don't need documentation telling them what they did. Where I've enhanced the docker-plugin, I've tried to also enhance the help text that's built into the UI, but that doesn't affect the official documentation (which is mostly telling folks to use stuff that's provided by the docker-workflow-plugin ... which is why I believed that this plugin would be superseded by it until the discussion in #681 said otherwise).

if/when PR 681 is eventually released

FYI you can try out PR 681 right now - or any PR - go to the "checks" bit at the bottom, click "show checks" and follow the link to the build (the pr-merge bit) "details" to take you to the Jenkins ci server that built it, and then to the "Artifacts" from the build - there you'll find a .hpi file you can download and install (via manage jenkins -> manage plugins -> advanced -> upload). To be honest, I could really do with a 2nd-opinion on that PR as it's totally outside my knowhow, so please give it a test and let me know if it's worth having.

nohup

I'm not sure where it's coming from, as github.com can't find "nohup" in docker-plugin or docker-workflow-plugin. If you can figure out what parameters are being used when the container is created then you'll be able to see if it's something coming from Jenkins or something built into the container image itself (maybe docker inspect can help here too).

taking the time to respond

FYI I'd like to have things working on Windows too; at present, where I work, all our Windows-based stuff is on VMs (which take an age to boot up) and docker containers are lighter-weight and more efficient, which would mean I get more builds done on existing hardware. If I can un-stick you, maybe I'll learn how to do the same myself... i.e. it's not all altruism - I want it too ;-)

pjdarton avatar Oct 02 '19 17:10 pjdarton

Yes; sadly a lack of docs is not an uncommon issue with free software - the folks who write code to make it do things don't need documentation telling them what they did.

Indeed. I totally get the struggle. Developers time is precious especially when, as often is the case, it's done 'side of the desk' to a real job or other commitments. The efforts are greatly appreciated and valuable to so many people. Documentation falls to the back burner 90% of the time and I've seen full blown commercial, enterprise (and expensive) software with worse documentation than open-source projects maintained by a single person. Docs is often the trade off for free software.

if/when PR 681 is eventually released

FYI you can try out PR 681 right now

I would love to do this but unsure if I will be able to near-term. I have some other things like this Windows build agent I need to squash first. Granted the docker-plugin might help solve that or work around it but it seems like there are some underlying issues here that need sorting first.

nohup

I'm not sure where it's coming from

The Jenkins job build log doesn't give any visibility and the main Jenkins master log has no entries related to this job. Is there additional logging that can be enabled that's relevant to this?

Running on Windows in C:/jenkins/workspace/Ephemeral Build Agent PoC/Ephemeral Build Agent PoC v6 Docker enabled slave Windows
[Pipeline] {
[Pipeline] sh
[Pipeline] }
[Pipeline] // node
[Pipeline] End of Pipeline
...
Caused: java.io.IOException: Cannot run program "nohup" 

I ran another test pipeline job that that uses the same Windows node without the docker {} syntax and it ran fine. The docker based job is still trying to run an sh command at the start of the job despite sh not being used anywhere in the pipeline script explicitly. Hence my assumption it's related to the docker plugin (though I guess docker pipeline plugin in this case). That's the only difference in the pipeline scripts. Perhaps as you said earlier sh and by proxy nohup are being called in durable-task-plugin as you said earlier?

Successful non-docker

pipeline {
    agent {
        label 'dockerEnabledWindowsSlave'
    }
    stages {
        stage('Example Build') {
            steps {
                echo 'test'
            }
        }
    }
}

Unsuccessful docker job

pipeline {
    agent {
        docker {
            image 'iis'
            label 'dockerEnabledWindowsSlave'
        }
    }
    stages {
        stage('Example Build') {
            steps {
                echo 'test'
            }
        }
    }
}

i.e. it's not all altruism - I want it too ;-)

Maybe we can figure something out together and even update the doc :D

At this point I may try to get nohup working via this method mentioned earlier to see if we can at least get to the next step and see what it's trying to do unless you have another suggestion.

Edit: I successfully setup git-bash tools via this suggestion and it resolved the sh/nohup issue. Now I am running into the "invalid volume specification" error, which I see you've discussed here and is clearly related to the docker pipeline plugin not this one (as we've already established). https://github.com/jenkinsci/docker-plugin/issues/666. Looking here it seems like this error might be a dead end for docker pipeline on Windows. I'll keep digging elsewhere

Suggestion. Maybe included something at the bottom of the readme.md for docker-plugin that mentions that dockerNode and Jenkins UI are specific to this plugin and absence of those also means it's probably the docker-workflow-plugin that's being used?

bverkron avatar Oct 02 '19 18:10 bverkron

Looking here it seems like this error might be a dead end for docker pipeline on Windows.

Assuming this is true we may need to migrate to this docker-plugin at least for Windows stuff if not everything. That being said can docker-plugin support declarative pipelines AND Dockerfile files referenced in the declarative pipeline script? It doesn't seem like it based on the documentation and my experimenting thus far.

bverkron avatar Oct 02 '19 19:10 bverkron

I think you need to get your Windows containers to use the JNLP or Attach connection method instead. It looks like the SSH method has a lot of hard-coded unixy stuff in it at present. ...and, to be frank, I wouldn't recommend relying on SSH on Windows - an SSH environment in Windows is tricky to make work and be secure; you usually end up having to choose between "secure" and "lets you do everything your build needs to do".

You are right, its configured with ssh key injection. In the container openSSH is already running and works. ;-) I can connect via ssh it. Therefore the idea was to use the same mechanism, as for the linux containers.

But I will try the JNLP approach, as the windows containers are running on a different machine than the jenkins master.

BlueAndi avatar Oct 04 '19 18:10 BlueAndi

I've got our entire build setup working on Windows based images. Your configuration is incorrect, this should not be an issue for this plugin.

Heneman avatar Nov 07 '19 17:11 Heneman

With ssh key injection configured?

BlueAndi avatar Nov 07 '19 17:11 BlueAndi

Update: I recently had to go delving in this area in order to fix the SSH unit-tests and so I took a good long look at the code. In the process of trying to figure out why the SSH-connector unit tests had stopped working, I coded up a connector that avoids specifying /bin/sshd as the CMD; it just passes the SSH-key as the sole argument (i.e. exactly as the standard Jenkins SSH-slave image wants).

Disclaimer: This code is not finished. It's not as polished as it could be; at the very least, it'll need improvements to the online help to explain the difference between the connection methods (as it's obvious that this plugin needs better docs in this area!). It's also not tested - the only testing I've done is run the unit-tests, and only on linux (I have no Windows docker resource at present). However, the new SSH connection method (if it works at all) might well work for Windows docker folks where the others do not; it might be worth your while trying it out.

You can find this code in PR #763 and you can find a .hpi file here - that .hpi file is build from the master branch (i.e. latest bleeding-edge code, aka release 1.1.9 right now) plus that PR's changes. If these changes are well received then it'd be worthwhile improving them to the point where they're fit for merge...

pjdarton avatar Nov 25 '19 11:11 pjdarton

@pjdarton This sounds good. I hope I can try it today and give some feedback.

BlueAndi avatar Dec 05 '19 08:12 BlueAndi

Same error happended:

Provisioning 'lp13007:5000/docker-ssh-slave:win-1903' number 1 (of 1) on 'windows cloud agents'; Total containers: 0 (of 4)
Dec 05, 2019 11:03:32 AM INFO com.nirima.jenkins.plugins.docker.DockerCloud provision
Will provision 'lp13007:5000/docker-ssh-slave:win-1903', for label: 'win-agent', in cloud: 'windows cloud agents'
Dec 05, 2019 11:03:32 AM INFO hudson.slaves.NodeProvisioner$StandardStrategyImpl apply
Started provisioning Image of lp13007:5000/docker-ssh-slave:win-1903 from windows cloud agents with 1 executors. Remaining excess workload: 0
Dec 05, 2019 11:03:32 AM INFO com.nirima.jenkins.plugins.docker.DockerTemplate pullImage
Pulling image 'lp13007:5000/docker-ssh-slave:win-1903'. This may take awhile...
Dec 05, 2019 11:03:32 AM INFO io.jenkins.docker.client.DockerAPI getOrMakeClient
Cached connection io.jenkins.docker.client.DockerAPI$SharableDockerClient@1c14058f to DockerClientParameters{dockerUri=tcp://lp13007:2375, credentialsId=null, readTimeoutInMsOrNull=300000, connectTimeoutInMsOrNull=60000}
Dec 05, 2019 11:03:33 AM INFO com.nirima.jenkins.plugins.docker.DockerTemplate pullImage
Finished pulling image 'lp13007:5000/docker-ssh-slave:win-1903', took 1058 ms
Dec 05, 2019 11:03:33 AM INFO com.nirima.jenkins.plugins.docker.DockerTemplate doProvisionNode
Trying to run container for lp13007:5000/docker-ssh-slave:win-1903
Dec 05, 2019 11:03:33 AM INFO com.nirima.jenkins.plugins.docker.DockerTemplate doProvisionNode
Trying to run container for node win-agent-0002nvuz6oiow from image: lp13007:5000/docker-ssh-slave:win-1903
Dec 05, 2019 11:03:33 AM INFO com.nirima.jenkins.plugins.docker.DockerTemplate doProvisionNode
Started container ID 96ef9a8f7f43d232a75c52e4ad350ce66d457da9ddb93798cef62afc6fc290a8 for node win-agent-0002nvuz6oiow from image: lp13007:5000/docker-ssh-slave:win-1903
Dec 05, 2019 11:03:33 AM SEVERE com.github.dockerjava.core.async.ResultCallbackTemplate onError
Error during callback
com.github.dockerjava.api.exception.NotFoundException: {"message":"Could not find the file /root in container 96ef9a8f7f43d232a75c52e4ad350ce66d457da9ddb93798cef62afc6fc290a8"}

=> "Could not find the file /root in container ...

BlueAndi avatar Dec 05 '19 11:12 BlueAndi

The only place /root happens in the plugin is when it's using the InjectSSHKey connection method, which (with this PR's code installed) shows up in the WebUI configuration page as "Inject SSH key using SSH AuthorizedKeysCommand option" (previously, this option was simply called "Inject SSH key").

You need to switch to the new InjectSSHKeyAsContainerArgument connection method which will show up in the WebUI configuration page as "Inject SSH key as 1st container argument".

pjdarton avatar Dec 05 '19 14:12 pjdarton

Argh ... I missed to change to InjectSSHKeyAsContainerArgument. I will try again and come back.

BlueAndi avatar Dec 05 '19 17:12 BlueAndi

Result looks better now. The container itself is started and it looks like (according to the logs) that a SSH connection was established (SSH port is open on lp13007:55137).

The jenkins pipeline script shall now just call a powershell 'dir' command and this doesn't happen.

According to the logs, another agent is requested and etc.

Logs:

Started container ID 5d61972f90fc76d696fad74efb9866eeaf1e598143878c348f506f3d9d597196 for node win-agent-0003fn8t8zrkc from image: lp13007:5000/docker-ssh-slave:win-1903
Dec 06, 2019 8:49:03 AM INFO com.nirima.jenkins.plugins.docker.utils.PortUtils$ConnectionCheckSSH executeOnce
SSH port is open on lp13007:55147
Dec 06, 2019 8:49:04 AM INFO hudson.slaves.NodeProvisioner lambda$update$6
Image of lp13007:5000/docker-ssh-slave:win-1903 provisioning successfully completed. We have now 3 computer(s)
Dec 06, 2019 8:49:04 AM INFO com.nirima.jenkins.plugins.docker.DockerCloud provision
Asked to provision 1 slave(s) for: win-agent
Dec 06, 2019 8:49:04 AM INFO com.nirima.jenkins.plugins.docker.DockerCloud canAddProvisionedSlave
Not Provisioning 'lp13007:5000/docker-ssh-slave:win-1903'. Template instance limit of '1' reached on cloud 'windows cloud agents'
Dec 06, 2019 8:49:04 AM INFO com.nirima.jenkins.plugins.docker.DockerCloud provision
Asked to provision 1 slave(s) for: win-agent
Dec 06, 2019 8:49:04 AM INFO com.nirima.jenkins.plugins.docker.DockerCloud canAddProvisionedSlave
Not Provisioning 'lp13007:5000/docker-ssh-slave:win-1903'. Template instance limit of '1' reached on cloud 'windows cloud agents'
Dec 06, 2019 8:49:14 AM INFO com.nirima.jenkins.plugins.docker.DockerCloud provision
Asked to provision 1 slave(s) for: win-agent
Dec 06, 2019 8:49:14 AM INFO com.nirima.jenkins.plugins.docker.DockerCloud canAddProvisionedSlave
Not Provisioning 'lp13007:5000/docker-ssh-slave:win-1903'. Template instance limit of '1' reached on cloud 'windows cloud agents'
Dec 06, 2019 8:49:14 AM INFO com.nirima.jenkins.plugins.docker.DockerCloud provision
Asked to provision 1 slave(s) for: win-agent
Dec 06, 2019 8:49:14 AM INFO com.nirima.jenkins.plugins.docker.DockerCloud canAddProvisionedSlave
Not Provisioning 'lp13007:5000/docker-ssh-slave:win-1903'. Template instance limit of '1' reached on cloud 'windows cloud agents'
Dec 06, 2019 8:49:24 AM INFO com.nirima.jenkins.plugins.docker.DockerCloud provision
Asked to provision 1 slave(s) for: win-agent
Dec 06, 2019 8:49:24 AM INFO com.nirima.jenkins.plugins.docker.DockerCloud canAddProvisionedSlave
Not Provisioning 'lp13007:5000/docker-ssh-slave:win-1903'. Template instance limit of '1' reached on cloud 'windows cloud agents'
Dec 06, 2019 8:49:24 AM INFO com.nirima.jenkins.plugins.docker.DockerCloud provision
Asked to provision 1 slave(s) for: win-agent
Dec 06, 2019 8:49:24 AM INFO com.nirima.jenkins.plugins.docker.DockerCloud canAddProvisionedSlave
Not Provisioning 'lp13007:5000/docker-ssh-slave:win-1903'. Template instance limit of '1' reached on cloud 'windows cloud agents'

This repeats after container watchdog is triggered.

BlueAndi avatar Dec 06 '19 08:12 BlueAndi

Hmm. Those logs show that the container started and its SSH port opened. They don't show much more than that :worried: ... but at least they're not showing an exception :grin:

The fact that the docker plugin is still being asked to provision a node 20 seconds later implies that the slave failed to come online (i.e. the container exists, but Jenkins wasn't able to connect to it and run the Jenkins slave.jar code on it), which would imply that the SSH connection process didn't complete ... but that wouldn't show up here,

I think that the next place to look would be the log for the slave node itself.

i.e. you should see the docker slave node appearing in Jenkins' list of executors/slaves and that WebUI page has a "log" page on it - check what that's reporting as that's where any SSHConnector issues will be shown. (For example, when I was debugging why the plugin's ssh-connector unit-tests were failing, I eventually found the "we can't find where java is on this container" error in the slave's log page)

pjdarton avatar Dec 06 '19 13:12 pjdarton

To check the agents log was a good hint:

SSHLauncher{host='lp13007', port=55267, credentialsId='InstanceIdentity', jvmOptions='', javaPath='', prefixStartSlaveCmd='', suffixStartSlaveCmd='', launchTimeoutSeconds=210, maxNumRetries=30, retryWaitTime=2, sshHostKeyVerificationStrategy=hudson.plugins.sshslaves.verifiers.NonVerifyingKeyVerificationStrategy, tcpNoDelay=true, trackCredentials=true}
[12/09/19 11:51:48] [SSH] Opening SSH connection to lp13007:55267.
[12/09/19 11:51:49] [SSH] WARNING: SSH Host Keys are not being verified. Man-in-the-middle attacks may be possible against this connection.
ERROR: Server rejected the 1 private key(s) for root (credentialId:InstanceIdentity/method:publickey)
[12/09/19 11:51:49] [SSH] Authentication failed.
Authentication failed.
[12/09/19 11:51:49] Launch failed - cleaning up connection
[12/09/19 11:51:49] [SSH] Connection closed.

The node itselfs uses a image (windows container), based on https://github.com/jenkinsci/docker-ssh-slave

BlueAndi avatar Dec 09 '19 11:12 BlueAndi

OK, so the WARNING: SSH Host Keys are not being verified is a good sign... ...but the ERROR: Server rejected the 1 private key(s) for root is not.

However, the cause is revealed right there - you're trying to login to a Windows container as "root". I believe that the username should be jenkins (for both the Windows and Linux docker-ssh-slave images). i.e. it must not be root for a Windows image :grin:

Once you've sorted that out, if it still isn't working then the next step of the investigation is to use docker inspect on the container (which you'll have to do soon after it's created otherwise it'll be cleaned up). What we're looking for there is indication of the argument(s) provided to the container (by the docker plugin's new InjectSSHKeyAsContainerArgument method. If that looks good then that implies that the fault is in the container ... if it doesn't look right then the fault would be in my code.

FYI what I'm expecting is that you'll tell the docker-plugin code that it needs to log in as "jenkins" (and probably have to also tell it the home directory is C:/Users/jenkins too), the plugin will tell the container the public key it should accept (which will show up in docker inspect), and then the SSH connection code should try to connect as user "jenkins" using the private key matching the public key the container was told to accept (which will show up in the slave's log) ... and it should work...

pjdarton avatar Dec 10 '19 14:12 pjdarton

The "Remote File System Root" configuration parameter is set to "C:\Users\jenkins". The user inside the container should be jenkins, thats true. But I can not configure the user with InjectSSHKeyAs1stParameter, only with a different configuration. This may be the problem.

BlueAndi avatar Dec 12 '19 13:12 BlueAndi

Ah... yes, that would be a problem - it definitely needs to be jenkins for the public Windows docker image (unless you've built one yourself and passed in USER=root during the image build process). I probably missed out a config.jelly file allowing this to be configured (I've never fully understood the Jenkins jelly/binding process so some "trial and error" always seems necessary...)

Before I go rummaging in the code ... can you confirm that you're able to configure the username for the original key-injection method: image ...but you can't for the new one (meaning that it stays with the default value of root). If that's correct then I'm pretty sure I know what I'll need to do to fix this... I just don't have time right this minute...

In the meantime: If you are able to manually edit your Jenkins server's config.xml file then you need not wait for me to get that done - you can use a text editor to fiddle with the cloud configuration even through the Jenkins WebUI is missing that field. Use the Jenkins WebUI to configure Jenkins to use the new connection method (so that'll tell it to use the default root username), save that (which will save the data to the file config.xml in your Jenkins home directory. Edit the config.xml file and look for the <connector class="io.jenkins.docker.connector.DockerComputerSSHConnector"> section. Find the <sshKeyStrategy class="io.jenkins.docker.connector.DockerComputerSSHConnector$InjectSSHKeyAsContainerArgument"> element - that should have a <user>root</user> element (or might not have that element at all). Change "root" to "jenkins":

<sshKeyStrategy class="io.jenkins.docker.connector.DockerComputerSSHConnector$InjectSSHKeyAsContainerArgument">
  <user>jenkins</user>
</sshKeyStrategy>

Save the file. Tell Jenkins to "Reload configuration from disk".

That should let you manually do what the WebUI doesn't currently let you do, and should let you get further with testing while you wait for an updated plugin (which might take a while as I'm busy on other things at present - I can spare the time to type out advice here, but coding will take longer...)

pjdarton avatar Dec 12 '19 14:12 pjdarton

There are three options to choose:

  • Inject SSH key as 1st container argument
  • Inject SSH key using SSH AuthorizedKeysCommand option
  • Use configured SSH credentials

Choosing the 2nd one, its possible to enter the username. I guess its the old "Inject SSH key" option, but you changed the text, isn't it?

I will follow your suggestion, changing the username in the xml file and come back with feedback.

BTW to not forget it, thanks for your support, its very much appreciated! We keep it similar to pair programming, you advice and I test. ;-)

BlueAndi avatar Dec 12 '19 15:12 BlueAndi