docker-plugin
docker-plugin copied to clipboard
[Feature request] Windows container support
Would be great to have windows container support as well.
What exactly is / isn't supported on the Windows side for this plugin?
We currently have ephemeral Jenkins build agent on the Linux side via this plugin and need to set one up for Windows as well.
That's a good question, and I don't know the answer as I don't use Windows containers myself (and nobody who does has set out exactly what the issues are).
FYI, internally, the plugin doesn't care what OS you're using. Internally, it's all Java (as is Jenkins as a whole) and it's talking to the docker daemon(s) via a Java library (docker-java) so if Microsoft's implementation of docker is a compliant implementation of docker (rather than something that is not docker but that Microsoft call docker, which can happen when corporations believe there's "no standard they can't improve on" :angry: ) then it should "just work" ... ... but I presume there must be at least one reason why it doesn't "just work" otherwise folks wouldn't be raising this kind of issue.
If anyone's willing to investigate and implement this (see CONTRIBUTING guidelines) then I'd be happy to review the code and, ultimately, merge things in.
Sounds like my assumption that it should work (but might have some quirks / problems) was correct.
We're attempting to use docker containers in our declarative pipelines but it's falling down in an odd place on the Windows slaves whereas it worked on the Linux side with the identical setup.
It's not clear to me if this is caused by the way the plugin behaves with Windows slaves or if it's some different behavior in the Windows implementation of Docker.
Pipeline script Same syntax as working Linux jobs except for image and label tags.
pipeline {
agent {
docker {
image 'iis'
label 'dockerEnabledWindows'
}
}
stages {
stage('Example Build') {
steps {
sh 'hostname'
}
}
}
}
Build Console Log
Running on Windows in C:/jenkins/workspace/Ephemeral Build Agent PoC/Ephemeral Build Agent PoC v6 Docker enabled Windows slave
[Pipeline] {
[Pipeline] withEnv
[Pipeline] {
[Pipeline] withDockerRegistry
Using the existing docker config file.Removing blacklisted property: auths$ docker login -u ****** -p ******** https://index.docker.io/v1/
WARNING! Using --password via the CLI is insecure. Use --password-stdin.
Error response from daemon: Get https://registry-1.docker.io/v2/: unauthorized: incorrect username or password
[Pipeline] // withDockerRegistry
[Pipeline] }
[Pipeline] // withEnv
[Pipeline] }
[Pipeline] // node
[Pipeline] End of Pipeline
ERROR: docker login failed
Finished: FAILURE
Below is the first error we got so I setup an account with Docker Hub and ran the docker login command to cache the login, hoping that would work. But it just produced a variation of the save error (above).
$ docker login -u ******* -p ******** https://index.docker.io/v1/
WARNING! Using --password via the CLI is insecure. Use --password-stdin.
Error response from daemon: Get https://registry-1.docker.io/v2/: unauthorized: incorrect username or password
The docker login is the odd piece to me as we're not making any explicit attempts to connect to a registry and definitely not a private one thus IMO there should be not need for the login command nor credentials. On the Linux side it happily connects to Docker Hub without the login command as far as I can tell and we've never had to do anything with credentials.
Trying to determine where the problem might lie so I know whether to dig down the Jenkins Docker plugin path or the Windows Docker configuration path
Ah, withDockerRegistry
suggests that you're using the docker-workflow-plugin not the docker-plugin.
Different plugin, different way of working, different GitHub repo ... and not a plugin I know much about, aside from everyone confusing it for this one.
What would cause Jenkins to use the docker-workflow-plugin instead of docker-plugin like the other job? They are configured the same way (literally just the declarative pipeline script above) and just pointing at a Linux vs Windows node / image
This is what the Linux node job runs...
[Pipeline] {
[Pipeline] sh
+ docker inspect -f . maven
.
[Pipeline] withDockerContainer
Linux does not seem to be running inside a container
$ docker run -t -d -u 164263:164263 -w "/home/_jenkinsauto/remote/workspace/Ephemeral Build Agent PoC/Ephemeral Build Agent PoC v6 Docker enabled slave@2" -v "/home/_jenkinsauto/remote/workspace/Ephemeral Build Agent PoC/Ephemeral Build Agent PoC v6 Docker enabled slave@2:/home/_jenkinsauto/remote/workspace/Ephemeral Build Agent PoC/Ephemeral Build Agent PoC v6 Docker enabled slave@2:rw,z" -v "/home/_jenkinsauto/remote/workspace/Ephemeral Build Agent PoC/Ephemeral Build Agent PoC v6 Docker enabled slave@2@tmp:/home/_jenkinsauto/remote/workspace/Ephemeral Build Agent PoC/Ephemeral Build Agent PoC v6 Docker enabled slave@2@tmp:rw,z" -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** maven cat
$ docker top 489b5b3d2dc7327c787cfdf8044a945b9735737c690c3162abdb45e051138e71 -eo pid,comm
[Pipeline] {
[Pipeline] stage
[Pipeline] { (Example Build)
[Pipeline] sh
...
Edit: Re-running the linux job now returns the same docker login
error as the Windows job. AFAIK nothing in the Jenkins master config has changed. Where / how do we specify which plugin to use? Uninstall of the docker-workflow-plugin (i.e. Docker Pipeline) doesn't seem to be an option as it's inactive under pluginManager/installed and I can't interact with it.
My test setup was a pc with windows 10, docker desktop ce installed and the docker daemon running, as well a registry on it. It provides several agents, based on windows docker images. The jenkins master runs on a different pc.
I configured a "windows" cloud in the jenkins master (provided by the docker plugin) with a test agent.
Calling this test agent now in a job, will result that a corresponding docker container is created, but then the access to it fails.
Tomorrow I can provide the error logs and more information about the configuration.
Edit: Re-running the linux job now returns the same
docker login
error as the Windows job. AFAIK nothing in the Jenkins master config has changed. Where / how do we specify which plugin to use?
Found the solution to the docker login
problem. Under Manage Jenkins > Configure System > Pipeline Model Definition a value had been selected under Registry credentials. Since the other two fields were blank it was simply trying to force a login for the public Docker Hub with the selected credentials and being a global settings it was effecting all docker related jobs.
This also indicates that the pipeline scripts were not actually using the docker-workflow-plugin (aka Docker Pipeline) plugin as previously suggested.
Now the problem becomes the docker-plugin seemingly trying to treat the Windows host as a Linux host and trying to execute the nohup
command.
Running on Windows in C:/jenkins/workspace/Ephemeral Build Agent PoC/Ephemeral Build Agent PoC v6 Docker enabled slave Windows
[Pipeline] {
[Pipeline] sh
[Pipeline] }
[Pipeline] // node
[Pipeline] End of Pipeline
...
Caused: java.io.IOException: Cannot run program "nohup" (in directory "C:\jenkins\workspace\Ephemeral Build Agent PoC\Ephemeral Build Agent PoC v6 Docker enabled slave Windows"): CreateProcess error=2, The system cannot find the file specified
Which could be a configuration thing but also seems that docker-plugin is trying to use the nohup command when doing the docker steps which of course isn't available on Windows by default.
I think this can be solved via a process like this (i.e. installing & configuring git-bash or similar) but is this the expected / correct setup for docker on Windows slave hosts? https://stackoverflow.com/a/45151156
Will poke around but any guidance would be appreciated.
Disclaimer: on mobile, from home, going from memory and not looking stuff up as it's my bed time...
Take a look at the "advanced" connection properties, e.g. jnlp or direct or SSH. In there you may find the ability to override default "start slave" commands. The online help may even tell you what the defaults are. That'll be in manage Jenkins -> configure system -> scroll down to "clouds" and look in the templates you've defined ... if you are using this plug-in to provide your executors and not the docker-workflow-plugin, that is ;-)
I will take a look. Away from the office ATM so will be tomorrow. In the meantime...
How do we differentiate between which plugin is being used by the commands? I believe the pipeline script is being used based on experimentation, syntax, and discussion in other threads for docker-plugin, but how to I confirm?
We don't have any clouds defined as, right now, we're dealing with specifics Jenkins slaves w/Docker installed so we can define the images in the declarative pipeline script itself (and thus in SCM) and/or Dockerfile files rather than the Jenkins UI. In our case I think the relevant connection settings would be under the slave itself under the nodes section of Jenkins master.
Using this plugin and not the docker-workflow plugin ;-), I get the following result:
Asked to provision 1 slave(s) for: win-agent
Oct 02, 2019 8:45:02 AM INFO io.jenkins.docker.client.DockerAPI$1 entryDroppedFromCache
Dropped connection io.jenkins.docker.client.DockerAPI$SharableDockerClient@1fe310cf to DockerClientParameters{dockerUri=tcp://lp13007:2375, credentialsId=null, readTimeoutInMsOrNull=300000, connectTimeoutInMsOrNull=60000}
Oct 02, 2019 8:45:02 AM INFO com.nirima.jenkins.plugins.docker.DockerCloud canAddProvisionedSlave
Provisioning 'localhost:5000/win-agent' number 1 (of 1) on 'windows cloud agents'; Total containers: 0 (of 4)
Oct 02, 2019 8:45:02 AM INFO com.nirima.jenkins.plugins.docker.DockerCloud provision
Will provision 'localhost:5000/win-agent', for label: 'win-agent', in cloud: 'windows cloud agents'
Oct 02, 2019 8:45:02 AM INFO hudson.slaves.NodeProvisioner$StandardStrategyImpl apply
Started provisioning Image of localhost:5000/win-agent from windows cloud agents with 1 executors. Remaining excess workload: 0
Oct 02, 2019 8:45:02 AM INFO com.nirima.jenkins.plugins.docker.DockerTemplate pullImage
Pulling image 'localhost:5000/win-agent:latest'. This may take awhile...
Oct 02, 2019 8:45:02 AM INFO io.jenkins.docker.client.DockerAPI getOrMakeClient
Cached connection io.jenkins.docker.client.DockerAPI$SharableDockerClient@5ea08dda to DockerClientParameters{dockerUri=tcp://lp13007:2375, credentialsId=null, readTimeoutInMsOrNull=300000, connectTimeoutInMsOrNull=60000}
Oct 02, 2019 8:45:03 AM INFO com.nirima.jenkins.plugins.docker.DockerTemplate pullImage
Finished pulling image 'localhost:5000/win-agent:latest', took 994 ms
Oct 02, 2019 8:45:03 AM INFO com.nirima.jenkins.plugins.docker.DockerTemplate doProvisionNode
Trying to run container for localhost:5000/win-agent
Oct 02, 2019 8:45:03 AM INFO com.nirima.jenkins.plugins.docker.DockerTemplate doProvisionNode
Trying to run container for node win-agent-0001r6amcl1rs from image: localhost:5000/win-agent
Oct 02, 2019 8:45:03 AM INFO com.nirima.jenkins.plugins.docker.DockerTemplate doProvisionNode
Started container ID 6e1c47d50b63636f580761fbe2b367ba53c902a07ad648eba7edd5c88c826d04 for node win-agent-0001r6amcl1rs from image: localhost:5000/win-agent
Oct 02, 2019 8:45:03 AM SEVERE com.github.dockerjava.core.async.ResultCallbackTemplate onError
Error during callback
com.github.dockerjava.api.exception.NotFoundException: {"message":"Could not find the file /root in container 6e1c47d50b63636f580761fbe2b367ba53c902a07ad648eba7edd5c88c826d04"}
at com.github.dockerjava.netty.handler.HttpResponseHandler.channelRead0(HttpResponseHandler.java:103)
at com.github.dockerjava.netty.handler.HttpResponseHandler.channelRead0(HttpResponseHandler.java:33)
at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
at io.netty.channel.ChannelInboundHandlerAdapter.channelRead(ChannelInboundHandlerAdapter.java:86)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
at io.netty.handler.logging.LoggingHandler.channelRead(LoggingHandler.java:241)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
at io.netty.channel.CombinedChannelDuplexHandler$DelegatingChannelHandlerContext.fireChannelRead(CombinedChannelDuplexHandler.java:438)
at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:310)
at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:284)
at io.netty.channel.CombinedChannelDuplexHandler.channelRead(CombinedChannelDuplexHandler.java:253)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
at io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:287)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1334)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:926)
at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:134)
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:644)
at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:579)
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:496)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:458)
at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858)
at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:138)
at java.lang.Thread.run(Thread.java:748)
Oct 02, 2019 8:45:03 AM SEVERE com.nirima.jenkins.plugins.docker.DockerCloud$1 run
Error in provisioning; template='DockerTemplate{configVersion=2, labelString='win-agent', connector=io.jenkins.docker.connector.DockerComputerSSHConnector@5f068bde, remoteFs='C:\Users\jenkins', instanceCap=1, mode=NORMAL, retentionStrategy=com.nirima.jenkins.plugins.docker.strategy.DockerOnceRetentionStrategy@4bcce0a5, dockerTemplateBase=DockerTemplateBase{image='localhost:5000/win-agent', pullCredentialsId='', registry=DockerRegistryEndpoint[null;credentialsId=null], dockerCommand='', hostname='', dnsHosts=[], network='', volumes=[], volumesFrom2=[], environment=[], bindPorts='', bindAllPorts=false, memoryLimit=null, memorySwap=null, cpuShares=null, shmSize=null, privileged=false, tty=false, macAddress='null', extraHosts=[]}, removeVolumes=false, pullStrategy=PULL_ALWAYS, nodeProperties=[], disabled=BySystem,0 ms,4 min 59 sec,Template provisioning failed.}' for cloud='windows cloud agents'
com.github.dockerjava.api.exception.NotFoundException: {"message":"Could not find the file /root in container 6e1c47d50b63636f580761fbe2b367ba53c902a07ad648eba7edd5c88c826d04"}
at com.github.dockerjava.netty.handler.HttpResponseHandler.channelRead0(HttpResponseHandler.java:103)
at com.github.dockerjava.netty.handler.HttpResponseHandler.channelRead0(HttpResponseHandler.java:33)
at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
at io.netty.channel.ChannelInboundHandlerAdapter.channelRead(ChannelInboundHandlerAdapter.java:86)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
at io.netty.handler.logging.LoggingHandler.channelRead(LoggingHandler.java:241)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
at io.netty.channel.CombinedChannelDuplexHandler$DelegatingChannelHandlerContext.fireChannelRead(CombinedChannelDuplexHandler.java:438)
at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:310)
at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:284)
at io.netty.channel.CombinedChannelDuplexHandler.channelRead(CombinedChannelDuplexHandler.java:253)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
at io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:287)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1334)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:926)
at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:134)
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:644)
at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:579)
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:496)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:458)
at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858)
at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:138)
at java.lang.Thread.run(Thread.java:748)
According to the message Could not find the file /root in container
it looks like the assumption is still a linux container.
What would cause Jenkins to use the docker-workflow-plugin instead of docker-plugin
It all comes down to what pipeline syntax you use.
Different plugins provide different functionality with different words.
The docker-plugin
provides very little pipeline functionality, just the dockerNode
keyword.
The docker-workflow-plugin
is what most people are using when they're doing pipelines with docker as that's what's documented in the Jenkins documentation - that's the plugin that provides pipeline keywords like withDockerRegistry
or docker.image
etc.
One other indication is that if your logs mention any docker command-line stuff then that's the docker-workflow-plugin
- the docker-plugin
doesn't use/need a docker command-line client as it uses a Java docker client to talk to docker daemons.
If you've defined some clouds and templates in Manage Jenkins -> Configure System -> docker clouds and then defined a pipeline to use a slave whose label matches one of the templates you've defined then your builds would be running on docker containers that are created by the docker-plugin
. FYI that's the docker-plugin's primary use case.
...and, yes, I know it's confusing, which is why the README, ISSUE_TEMPLATE and CONTRIBUTING docs in this plugin all mention this issue and tell folks to be sure of what they're using so they can report things in the right place, because it confuses the hell out of everyone and makes it all too easy to make mistakes.
docker-plugin is trying to use the nohup command
FYI the docker-plugin knows nothing of the nohup command; the word "nohup" is not in its code (it's not in the docker-workflow-plugin's code either).
However, nohup
what the Jenkins durable-task-plugin's step will add when it's not on a Darwin (mac) OS (for Windows, it assumes you're using Cygwin and will have nohup).
I'd suggest that, when doing Windows pipelines, you use something like the bat
pipeline command.
Or, alternatively, use the echo
pipeline command as that's platform agnostic, or run a groovy command to list all the environment variables etc.
Could not find the file /root in container
Aha! Yes, now we're getting somewhere :grin:
These logs did come from the docker-plugin
.
I've checked the plugin's code and, sure enough, /root/
is in the code - if your template is defined to use the SSH connector and you're using an injected key then it'll start the container with the command /usr/sbin/sshd -D -p <port> -o AuthorizedKeysCommand=/root/authorized_key -o AuthorizedKeysCommandUser=root
(see DockerComputerSSHConnector.java line 180) ... and would also try to run a /bin/sh script in the container to inject the key when the container starts too, so that's not going to work on Windows.
However, even if you don't inject a key, the SSH connector would still tell the container to start with the command /usr/sbin/sshd -D -p <port>
, and that's unlikely to work on Windows either.
I think you need to get your Windows containers to use the JNLP or Attach connection method instead. It looks like the SSH method has a lot of hard-coded unixy stuff in it at present. ...and, to be frank, I wouldn't recommend relying on SSH on Windows - an SSH environment in Windows is tricky to make work and be secure; you usually end up having to choose between "secure" and "lets you do everything your build needs to do".
It looks like the Attach method runs the following command on the container: java -jar <remoteFs>/slave.jar -noReconnect -noKeepAlive -slaveLog <remoteFs>/agent.log
So, as long as you ensure that java in on the %PATH%
and that you're setting the template's remoteFs correctly and that slave.jar
is already present there then it's likely to "just work".
The JNLP method provides more customisation capabilities (hidden in its "advanced" bits) so you can specify exactly what command the container should run, so if Attach doesn't work then JNLP can be forced to work.
The
docker-plugin
provides very little pipeline functionality, just thedockerNode
keyword.
This is the critical differentiator and sadly this wasn't mentioned anywhere in the extensive reading I've done on these plugins.
...and, yes, I know it's confusing, which is why the README, ISSUE_TEMPLATE and CONTRIBUTING docs in this plugin all mention this issue and tell folks to be sure of what they're using so they can report things in the right place, because it confuses the hell out of everyone and makes it all too easy to make mistakes.
I have read those and do apologize for letting this thread slip from a clarification request for the original FR into a Q&A. I did try the mailing list and even Reddit and it's mostly dead air out there for these kinds of questions. Zero responses elsewhere, sadly. Additionally even in those links there isn't anything that clearly states how to tell the difference at the top of the layer of the syntax (so to speak). We are using just docker {}
and dockerfile {}
syntax without directly specifying any of the underlying calls like with withDockerRegistry
so until that started bubbling up in some of these Windows jobs it was not clear to us that the docker-workflow
(aka pipeline) plugin was actually being used and there seemed to be evidence to the contrary. Furthermore, even in deep conversations in this thread despite detailed examples of what we were trying to do (which I would have thought would have made it obvious which plugin was actually being used) nothing was said about the difference between docker-plugin and docker-workflow-plugin in our context.
Since we want to "codify" everything directly in the declarative pipeline scripts and/or Dockerfile files (via SCM) it seems we're restricted to using the docker pipeline plugin. It's unclear to me if this docker-plugin
may be able to suite our needs via dockerNode
in declarative pipeline if/when PR 681 is eventually released. Hopefully one day things will merged / deprecated / documented as necessary to make this all more clear.
I'd suggest that, when doing Windows pipelines, you use something like the bat pipeline command. Or, alternatively, use the echo pipeline command as that's platform agnostic, or run a groovy command to list all the environment variables etc.
Attempted to use both echo
and bat
but the nohup
failure is occurring before those lines are even reached. It's failing at the first step of creating the container as far as I can tell from the log. I will investigate the cygwin approach and try to pursue this elsewhere in the context of the docker pipeline plugin.
Greatly appreciate you taking the time to respond here and apologies again for derailing this thread.
sadly this wasn't mentioned anywhere
Yes; sadly a lack of docs is not an uncommon issue with free software - the folks who write code to make it do things don't need documentation telling them what they did. Where I've enhanced the docker-plugin, I've tried to also enhance the help text that's built into the UI, but that doesn't affect the official documentation (which is mostly telling folks to use stuff that's provided by the docker-workflow-plugin ... which is why I believed that this plugin would be superseded by it until the discussion in #681 said otherwise).
if/when PR 681 is eventually released
FYI you can try out PR 681 right now - or any PR - go to the "checks" bit at the bottom, click "show checks" and follow the link to the build (the pr-merge bit) "details" to take you to the Jenkins ci server that built it, and then to the "Artifacts" from the build - there you'll find a .hpi file you can download and install (via manage jenkins -> manage plugins -> advanced -> upload). To be honest, I could really do with a 2nd-opinion on that PR as it's totally outside my knowhow, so please give it a test and let me know if it's worth having.
nohup
I'm not sure where it's coming from, as github.com can't find "nohup" in docker-plugin or docker-workflow-plugin. If you can figure out what parameters are being used when the container is created then you'll be able to see if it's something coming from Jenkins or something built into the container image itself (maybe docker inspect can help here too).
taking the time to respond
FYI I'd like to have things working on Windows too; at present, where I work, all our Windows-based stuff is on VMs (which take an age to boot up) and docker containers are lighter-weight and more efficient, which would mean I get more builds done on existing hardware. If I can un-stick you, maybe I'll learn how to do the same myself... i.e. it's not all altruism - I want it too ;-)
Yes; sadly a lack of docs is not an uncommon issue with free software - the folks who write code to make it do things don't need documentation telling them what they did.
Indeed. I totally get the struggle. Developers time is precious especially when, as often is the case, it's done 'side of the desk' to a real job or other commitments. The efforts are greatly appreciated and valuable to so many people. Documentation falls to the back burner 90% of the time and I've seen full blown commercial, enterprise (and expensive) software with worse documentation than open-source projects maintained by a single person. Docs is often the trade off for free software.
if/when PR 681 is eventually released
FYI you can try out PR 681 right now
I would love to do this but unsure if I will be able to near-term. I have some other things like this Windows build agent I need to squash first. Granted the docker-plugin might help solve that or work around it but it seems like there are some underlying issues here that need sorting first.
nohup
I'm not sure where it's coming from
The Jenkins job build log doesn't give any visibility and the main Jenkins master log has no entries related to this job. Is there additional logging that can be enabled that's relevant to this?
Running on Windows in C:/jenkins/workspace/Ephemeral Build Agent PoC/Ephemeral Build Agent PoC v6 Docker enabled slave Windows
[Pipeline] {
[Pipeline] sh
[Pipeline] }
[Pipeline] // node
[Pipeline] End of Pipeline
...
Caused: java.io.IOException: Cannot run program "nohup"
I ran another test pipeline job that that uses the same Windows node without the docker {}
syntax and it ran fine. The docker based job is still trying to run an sh
command at the start of the job despite sh
not being used anywhere in the pipeline script explicitly. Hence my assumption it's related to the docker plugin (though I guess docker pipeline plugin in this case). That's the only difference in the pipeline scripts. Perhaps as you said earlier sh
and by proxy nohup
are being called in durable-task-plugin
as you said earlier?
Successful non-docker
pipeline {
agent {
label 'dockerEnabledWindowsSlave'
}
stages {
stage('Example Build') {
steps {
echo 'test'
}
}
}
}
Unsuccessful docker job
pipeline {
agent {
docker {
image 'iis'
label 'dockerEnabledWindowsSlave'
}
}
stages {
stage('Example Build') {
steps {
echo 'test'
}
}
}
}
i.e. it's not all altruism - I want it too ;-)
Maybe we can figure something out together and even update the doc :D
At this point I may try to get nohup working via this method mentioned earlier to see if we can at least get to the next step and see what it's trying to do unless you have another suggestion.
Edit: I successfully setup git-bash tools via this suggestion and it resolved the sh/nohup issue. Now I am running into the "invalid volume specification" error, which I see you've discussed here and is clearly related to the docker pipeline plugin not this one (as we've already established). https://github.com/jenkinsci/docker-plugin/issues/666. Looking here it seems like this error might be a dead end for docker pipeline on Windows. I'll keep digging elsewhere
Suggestion. Maybe included something at the bottom of the readme.md for docker-plugin
that mentions that dockerNode and Jenkins UI are specific to this plugin and absence of those also means it's probably the docker-workflow-plugin that's being used?
Looking here it seems like this error might be a dead end for docker pipeline on Windows.
Assuming this is true we may need to migrate to this docker-plugin
at least for Windows stuff if not everything. That being said can docker-plugin
support declarative pipelines AND Dockerfile files referenced in the declarative pipeline script? It doesn't seem like it based on the documentation and my experimenting thus far.
I think you need to get your Windows containers to use the JNLP or Attach connection method instead. It looks like the SSH method has a lot of hard-coded unixy stuff in it at present. ...and, to be frank, I wouldn't recommend relying on SSH on Windows - an SSH environment in Windows is tricky to make work and be secure; you usually end up having to choose between "secure" and "lets you do everything your build needs to do".
You are right, its configured with ssh key injection. In the container openSSH is already running and works. ;-) I can connect via ssh it. Therefore the idea was to use the same mechanism, as for the linux containers.
But I will try the JNLP approach, as the windows containers are running on a different machine than the jenkins master.
I've got our entire build setup working on Windows based images. Your configuration is incorrect, this should not be an issue for this plugin.
With ssh key injection configured?
Update: I recently had to go delving in this area in order to fix the SSH unit-tests and so I took a good long look at the code.
In the process of trying to figure out why the SSH-connector unit tests had stopped working, I coded up a connector that avoids specifying /bin/sshd
as the CMD; it just passes the SSH-key as the sole argument (i.e. exactly as the standard Jenkins SSH-slave image wants).
Disclaimer: This code is not finished. It's not as polished as it could be; at the very least, it'll need improvements to the online help to explain the difference between the connection methods (as it's obvious that this plugin needs better docs in this area!). It's also not tested - the only testing I've done is run the unit-tests, and only on linux (I have no Windows docker resource at present). However, the new SSH connection method (if it works at all) might well work for Windows docker folks where the others do not; it might be worth your while trying it out.
You can find this code in PR #763 and you can find a .hpi
file here - that .hpi file is build from the master branch (i.e. latest bleeding-edge code, aka release 1.1.9 right now) plus that PR's changes.
If these changes are well received then it'd be worthwhile improving them to the point where they're fit for merge...
@pjdarton This sounds good. I hope I can try it today and give some feedback.
Same error happended:
Provisioning 'lp13007:5000/docker-ssh-slave:win-1903' number 1 (of 1) on 'windows cloud agents'; Total containers: 0 (of 4)
Dec 05, 2019 11:03:32 AM INFO com.nirima.jenkins.plugins.docker.DockerCloud provision
Will provision 'lp13007:5000/docker-ssh-slave:win-1903', for label: 'win-agent', in cloud: 'windows cloud agents'
Dec 05, 2019 11:03:32 AM INFO hudson.slaves.NodeProvisioner$StandardStrategyImpl apply
Started provisioning Image of lp13007:5000/docker-ssh-slave:win-1903 from windows cloud agents with 1 executors. Remaining excess workload: 0
Dec 05, 2019 11:03:32 AM INFO com.nirima.jenkins.plugins.docker.DockerTemplate pullImage
Pulling image 'lp13007:5000/docker-ssh-slave:win-1903'. This may take awhile...
Dec 05, 2019 11:03:32 AM INFO io.jenkins.docker.client.DockerAPI getOrMakeClient
Cached connection io.jenkins.docker.client.DockerAPI$SharableDockerClient@1c14058f to DockerClientParameters{dockerUri=tcp://lp13007:2375, credentialsId=null, readTimeoutInMsOrNull=300000, connectTimeoutInMsOrNull=60000}
Dec 05, 2019 11:03:33 AM INFO com.nirima.jenkins.plugins.docker.DockerTemplate pullImage
Finished pulling image 'lp13007:5000/docker-ssh-slave:win-1903', took 1058 ms
Dec 05, 2019 11:03:33 AM INFO com.nirima.jenkins.plugins.docker.DockerTemplate doProvisionNode
Trying to run container for lp13007:5000/docker-ssh-slave:win-1903
Dec 05, 2019 11:03:33 AM INFO com.nirima.jenkins.plugins.docker.DockerTemplate doProvisionNode
Trying to run container for node win-agent-0002nvuz6oiow from image: lp13007:5000/docker-ssh-slave:win-1903
Dec 05, 2019 11:03:33 AM INFO com.nirima.jenkins.plugins.docker.DockerTemplate doProvisionNode
Started container ID 96ef9a8f7f43d232a75c52e4ad350ce66d457da9ddb93798cef62afc6fc290a8 for node win-agent-0002nvuz6oiow from image: lp13007:5000/docker-ssh-slave:win-1903
Dec 05, 2019 11:03:33 AM SEVERE com.github.dockerjava.core.async.ResultCallbackTemplate onError
Error during callback
com.github.dockerjava.api.exception.NotFoundException: {"message":"Could not find the file /root in container 96ef9a8f7f43d232a75c52e4ad350ce66d457da9ddb93798cef62afc6fc290a8"}
=> "Could not find the file /root in container ...
The only place /root
happens in the plugin is when it's using the InjectSSHKey
connection method, which (with this PR's code installed) shows up in the WebUI configuration page as "Inject SSH key using SSH AuthorizedKeysCommand option" (previously, this option was simply called "Inject SSH key").
You need to switch to the new InjectSSHKeyAsContainerArgument
connection method which will show up in the WebUI configuration page as "Inject SSH key as 1st container argument".
Argh ... I missed to change to InjectSSHKeyAsContainerArgument. I will try again and come back.
Result looks better now. The container itself is started and it looks like (according to the logs) that a SSH connection was established (SSH port is open on lp13007:55137).
The jenkins pipeline script shall now just call a powershell 'dir' command and this doesn't happen.
According to the logs, another agent is requested and etc.
Logs:
Started container ID 5d61972f90fc76d696fad74efb9866eeaf1e598143878c348f506f3d9d597196 for node win-agent-0003fn8t8zrkc from image: lp13007:5000/docker-ssh-slave:win-1903
Dec 06, 2019 8:49:03 AM INFO com.nirima.jenkins.plugins.docker.utils.PortUtils$ConnectionCheckSSH executeOnce
SSH port is open on lp13007:55147
Dec 06, 2019 8:49:04 AM INFO hudson.slaves.NodeProvisioner lambda$update$6
Image of lp13007:5000/docker-ssh-slave:win-1903 provisioning successfully completed. We have now 3 computer(s)
Dec 06, 2019 8:49:04 AM INFO com.nirima.jenkins.plugins.docker.DockerCloud provision
Asked to provision 1 slave(s) for: win-agent
Dec 06, 2019 8:49:04 AM INFO com.nirima.jenkins.plugins.docker.DockerCloud canAddProvisionedSlave
Not Provisioning 'lp13007:5000/docker-ssh-slave:win-1903'. Template instance limit of '1' reached on cloud 'windows cloud agents'
Dec 06, 2019 8:49:04 AM INFO com.nirima.jenkins.plugins.docker.DockerCloud provision
Asked to provision 1 slave(s) for: win-agent
Dec 06, 2019 8:49:04 AM INFO com.nirima.jenkins.plugins.docker.DockerCloud canAddProvisionedSlave
Not Provisioning 'lp13007:5000/docker-ssh-slave:win-1903'. Template instance limit of '1' reached on cloud 'windows cloud agents'
Dec 06, 2019 8:49:14 AM INFO com.nirima.jenkins.plugins.docker.DockerCloud provision
Asked to provision 1 slave(s) for: win-agent
Dec 06, 2019 8:49:14 AM INFO com.nirima.jenkins.plugins.docker.DockerCloud canAddProvisionedSlave
Not Provisioning 'lp13007:5000/docker-ssh-slave:win-1903'. Template instance limit of '1' reached on cloud 'windows cloud agents'
Dec 06, 2019 8:49:14 AM INFO com.nirima.jenkins.plugins.docker.DockerCloud provision
Asked to provision 1 slave(s) for: win-agent
Dec 06, 2019 8:49:14 AM INFO com.nirima.jenkins.plugins.docker.DockerCloud canAddProvisionedSlave
Not Provisioning 'lp13007:5000/docker-ssh-slave:win-1903'. Template instance limit of '1' reached on cloud 'windows cloud agents'
Dec 06, 2019 8:49:24 AM INFO com.nirima.jenkins.plugins.docker.DockerCloud provision
Asked to provision 1 slave(s) for: win-agent
Dec 06, 2019 8:49:24 AM INFO com.nirima.jenkins.plugins.docker.DockerCloud canAddProvisionedSlave
Not Provisioning 'lp13007:5000/docker-ssh-slave:win-1903'. Template instance limit of '1' reached on cloud 'windows cloud agents'
Dec 06, 2019 8:49:24 AM INFO com.nirima.jenkins.plugins.docker.DockerCloud provision
Asked to provision 1 slave(s) for: win-agent
Dec 06, 2019 8:49:24 AM INFO com.nirima.jenkins.plugins.docker.DockerCloud canAddProvisionedSlave
Not Provisioning 'lp13007:5000/docker-ssh-slave:win-1903'. Template instance limit of '1' reached on cloud 'windows cloud agents'
This repeats after container watchdog is triggered.
Hmm. Those logs show that the container started and its SSH port opened. They don't show much more than that :worried: ... but at least they're not showing an exception :grin:
The fact that the docker plugin is still being asked to provision a node 20 seconds later implies that the slave failed to come online (i.e. the container exists, but Jenkins wasn't able to connect to it and run the Jenkins slave.jar code on it), which would imply that the SSH connection process didn't complete ... but that wouldn't show up here,
I think that the next place to look would be the log for the slave node itself.
i.e. you should see the docker slave node appearing in Jenkins' list of executors/slaves and that WebUI page has a "log" page on it - check what that's reporting as that's where any SSHConnector issues will be shown. (For example, when I was debugging why the plugin's ssh-connector unit-tests were failing, I eventually found the "we can't find where java is on this container" error in the slave's log page)
To check the agents log was a good hint:
SSHLauncher{host='lp13007', port=55267, credentialsId='InstanceIdentity', jvmOptions='', javaPath='', prefixStartSlaveCmd='', suffixStartSlaveCmd='', launchTimeoutSeconds=210, maxNumRetries=30, retryWaitTime=2, sshHostKeyVerificationStrategy=hudson.plugins.sshslaves.verifiers.NonVerifyingKeyVerificationStrategy, tcpNoDelay=true, trackCredentials=true}
[12/09/19 11:51:48] [SSH] Opening SSH connection to lp13007:55267.
[12/09/19 11:51:49] [SSH] WARNING: SSH Host Keys are not being verified. Man-in-the-middle attacks may be possible against this connection.
ERROR: Server rejected the 1 private key(s) for root (credentialId:InstanceIdentity/method:publickey)
[12/09/19 11:51:49] [SSH] Authentication failed.
Authentication failed.
[12/09/19 11:51:49] Launch failed - cleaning up connection
[12/09/19 11:51:49] [SSH] Connection closed.
The node itselfs uses a image (windows container), based on https://github.com/jenkinsci/docker-ssh-slave
OK, so the WARNING: SSH Host Keys are not being verified
is a good sign...
...but the ERROR: Server rejected the 1 private key(s) for root
is not.
However, the cause is revealed right there - you're trying to login to a Windows container as "root".
I believe that the username should be jenkins
(for both the Windows and Linux docker-ssh-slave images).
i.e. it must not be root for a Windows image :grin:
Once you've sorted that out, if it still isn't working then the next step of the investigation is to use docker inspect
on the container (which you'll have to do soon after it's created otherwise it'll be cleaned up).
What we're looking for there is indication of the argument(s) provided to the container (by the docker plugin's new InjectSSHKeyAsContainerArgument
method.
If that looks good then that implies that the fault is in the container ... if it doesn't look right then the fault would be in my code.
FYI what I'm expecting is that you'll tell the docker-plugin code that it needs to log in as "jenkins" (and probably have to also tell it the home directory is C:/Users/jenkins
too), the plugin will tell the container the public key it should accept (which will show up in docker inspect
), and then the SSH connection code should try to connect as user "jenkins" using the private key matching the public key the container was told to accept (which will show up in the slave's log) ... and it should work...
The "Remote File System Root" configuration parameter is set to "C:\Users\jenkins". The user inside the container should be jenkins, thats true. But I can not configure the user with InjectSSHKeyAs1stParameter, only with a different configuration. This may be the problem.
Ah... yes, that would be a problem - it definitely needs to be jenkins
for the public Windows docker image (unless you've built one yourself and passed in USER=root during the image build process).
I probably missed out a config.jelly file allowing this to be configured (I've never fully understood the Jenkins jelly/binding process so some "trial and error" always seems necessary...)
Before I go rummaging in the code ... can you confirm that you're able to configure the username for the original key-injection method:
...but you can't for the new one (meaning that it stays with the default value of
root
).
If that's correct then I'm pretty sure I know what I'll need to do to fix this... I just don't have time right this minute...
In the meantime: If you are able to manually edit your Jenkins server's config.xml
file then you need not wait for me to get that done - you can use a text editor to fiddle with the cloud configuration even through the Jenkins WebUI is missing that field.
Use the Jenkins WebUI to configure Jenkins to use the new connection method (so that'll tell it to use the default root
username), save that (which will save the data to the file config.xml
in your Jenkins home directory.
Edit the config.xml
file and look for the <connector class="io.jenkins.docker.connector.DockerComputerSSHConnector">
section.
Find the <sshKeyStrategy class="io.jenkins.docker.connector.DockerComputerSSHConnector$InjectSSHKeyAsContainerArgument">
element - that should have a <user>root</user>
element (or might not have that element at all).
Change "root" to "jenkins":
<sshKeyStrategy class="io.jenkins.docker.connector.DockerComputerSSHConnector$InjectSSHKeyAsContainerArgument">
<user>jenkins</user>
</sshKeyStrategy>
Save the file. Tell Jenkins to "Reload configuration from disk".
That should let you manually do what the WebUI doesn't currently let you do, and should let you get further with testing while you wait for an updated plugin (which might take a while as I'm busy on other things at present - I can spare the time to type out advice here, but coding will take longer...)
There are three options to choose:
- Inject SSH key as 1st container argument
- Inject SSH key using SSH AuthorizedKeysCommand option
- Use configured SSH credentials
Choosing the 2nd one, its possible to enter the username. I guess its the old "Inject SSH key" option, but you changed the text, isn't it?
I will follow your suggestion, changing the username in the xml file and come back with feedback.
BTW to not forget it, thanks for your support, its very much appreciated! We keep it similar to pair programming, you advice and I test. ;-)