yet-another-docker-plugin icon indicating copy to clipboard operation
yet-another-docker-plugin copied to clipboard

Remote call failed

Open farahfa opened this issue 8 years ago • 18 comments

Hello,

Jenkins jobs (maven jobs) keep on failing from time to time (doesn't happen all the time) with the following error:

Modules changed, recalculating dependency graph
Established TCP socket on 37493
maven33-agent.jar already up to date
maven33-interceptor.jar already up to date
maven3-interceptor-commons.jar already up to date
[Build.eng-idm.release.tif-sso] $ java -cp /var/lib/jenkins/maven33-agent.jar:/usr/share/maven/boot/plexus-classworlds-2.5.2.jar:/usr/share/maven/conf/logging jenkins.maven3.agent.Maven33Main /usr/share/maven /var/lib/jenkins/slave.jar /var/lib/jenkins/maven33-interceptor.jar /var/lib/jenkins/maven3-interceptor-commons.jar 37493
ERROR: Failed to parse POMs
java.io.IOException: Remote call on Docker-4b5a77361472 failed
	at hudson.remoting.Channel.call(Channel.java:838)
	at hudson.remoting.RemoteInvocationHandler.invoke(RemoteInvocationHandler.java:257)
	at hudson.maven.$Proxy91.accept(Unknown Source)
	at hudson.maven.AbstractMavenProcessFactory.newProcess(AbstractMavenProcessFactory.java:282)
	at hudson.maven.ProcessCache.get(ProcessCache.java:236)
	at hudson.maven.MavenModuleSetBuild$MavenModuleSetBuildExecution.doRun(MavenModuleSetBuild.java:798)
	at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:534)
	at hudson.model.Run.execute(Run.java:1728)
	at hudson.maven.MavenModuleSetBuild.run(MavenModuleSetBuild.java:544)
	at hudson.model.ResourceController.execute(ResourceController.java:98)
	at hudson.model.Executor.run(Executor.java:405)
Caused by: java.lang.LinkageError: Failed to load hudson.remoting.Pipe$ConnectCommand
	at hudson.remoting.RemoteClassLoader.loadClassFile(RemoteClassLoader.java:377)
	at hudson.remoting.RemoteClassLoader.findClass(RemoteClassLoader.java:285)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
	at hudson.maven.AbstractMavenProcessFactory$Connection.writeReplace(AbstractMavenProcessFactory.java:163)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at java.io.ObjectStreamClass.invokeWriteReplace(ObjectStreamClass.java:1118)
	at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1136)
	at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:348)
	at hudson.remoting.UserRequest._serialize(UserRequest.java:190)
	at hudson.remoting.UserRequest.serialize(UserRequest.java:199)
	at hudson.remoting.UserRequest.perform(UserRequest.java:161)
	at hudson.remoting.UserRequest.perform(UserRequest.java:50)
	at hudson.remoting.Request$2.run(Request.java:336)
	at hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:68)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)
	at ......remote call to Docker-4b5a77361472(Native Method)
	at hudson.remoting.Channel.attachCallSiteStackTrace(Channel.java:1545)
	at hudson.remoting.UserResponse.retrieve(UserRequest.java:253)
	at hudson.remoting.Channel.call(Channel.java:830)
	... 10 more
Caused by: java.lang.IllegalAccessError: class hudson.remoting.Pipe$ConnectCommand cannot access its superclass hudson.remoting.Command
	at java.lang.ClassLoader.defineClass1(Native Method)
	at java.lang.ClassLoader.defineClass(ClassLoader.java:763)
	at java.lang.ClassLoader.defineClass(ClassLoader.java:642)
	at hudson.remoting.RemoteClassLoader.loadClassFile(RemoteClassLoader.java:373)
	at hudson.remoting.RemoteClassLoader.findClass(RemoteClassLoader.java:285)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
	at hudson.maven.AbstractMavenProcessFactory$Connection.writeReplace(AbstractMavenProcessFactory.java:163)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at java.io.ObjectStreamClass.invokeWriteReplace(ObjectStreamClass.java:1118)
	at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1136)
	at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:348)
	at hudson.remoting.UserRequest._serialize(UserRequest.java:190)
	at hudson.remoting.UserRequest.serialize(UserRequest.java:199)
	at hudson.remoting.UserRequest.perform(UserRequest.java:161)
	at hudson.remoting.UserRequest.perform(UserRequest.java:50)
	at hudson.remoting.Request$2.run(Request.java:336)
	at hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:68)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)
[YAD-PLUGIN] Injecting DOCKER_CONTAINER_ID variable.
[YAD-PLUGIN] Injecting JENKINS_CLOUD_ID variable.
[YAD-PLUGIN] DOCKER_HOST variable.
Finished: FAILURE

This only happens when Jenkins is provisioning slaves from a docker swarm, (if I change the docker URL to a single docker host then it works just fine). I am not sure where this problem is originating from, so I'm posting this here to see if it's something with the YADP or something else.

Any help is much appreciated.

P.S.: This also happens in other jobs where they just hang forever (seems to lose connection with the, docker swarm, slave I think).

farahfa avatar Apr 03 '17 17:04 farahfa

What jenkins version is used?

maven33-agent.jar already up to date maven33-interceptor.jar already up to date

Is it only fails for maven projects? Do you have any nat between jenkins master and containers? What docker daemon version is used? Swarm classic or swarm mode?

KostyaSha avatar Apr 07 '17 00:04 KostyaSha

  • Jenkins is 2.46.1 (LTS)

  • It fails on all types of projects from time to time (I think there's some kind of race condition going on?), but very noticeable with Maven (throws errors mentioned in OP). Some jobs (for example the ones with RVM will hang at some point) and it does that randomly.

  • No NAT between master and containers

  • Docker version 1.11.2, build b9f10c9

  • Swarm classic

farahfa avatar Apr 10 '17 19:04 farahfa

Imho classloading issues is core/remoting issue cc @oleg-nenashev I had locally issues with SystemProperties, but everything looks right and it not reproducible. And jenkins here connected via jnlp (or ssh in your case?) so docker plugin doesn't look like culprit.

KostyaSha avatar Apr 10 '17 19:04 KostyaSha

Jenkins is connected using ssh... Hmm, this is a weird issue indeed. I'm trying to pin-point where the error is coming from, but I cannot tell exactly. :(

At least, I can rule out that it's not the docker plugin is not the problem.

farahfa avatar Apr 12 '17 17:04 farahfa

Hitting this problem with the latest jenkins version and java jre 9 headless, it was working until i wiped the jobs , so i created a new job and then it starts failing Xvfb stopping FATAL: Remote call on docker-4b1f20c42f27 failed java.lang.ClassNotFoundException: Classloading from system classloader disabled at hudson.remoting.RemoteClassLoader$ClassLoaderProxy.fetch4(RemoteClassLoader.java:834) at hudson.remoting.RemoteClassLoader$ClassLoaderProxy.fetch3(RemoteClassLoader.java:867) at sun.reflect.GeneratedMethodAccessor18.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at hudson.remoting.RemoteInvocationHandler$RPCRequest.perform(RemoteInvocationHandler.java:896) at hudson.remoting.Request$2.run(Request.java:336) at hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:68) at org.jenkinsci.remoting.CallableDecorator.call(CallableDecorator.java:19) at hudson.remoting.CallableDecoratorList$1.call(CallableDecoratorList.java:21) at jenkins.util.ContextResettingExecutorService$2.call(ContextResettingExecutorService.java:46) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) at ......remote call to channel(Native Method) at hudson.remoting.Channel.attachCallSiteStackTrace(Channel.java:1554) at hudson.remoting.Request.call(Request.java:172) at hudson.remoting.RemoteInvocationHandler.invoke(RemoteInvocationHandler.java:260) at com.sun.proxy.$Proxy6.fetch3(Unknown Source) at hudson.remoting.RemoteClassLoader.findClass(RemoteClassLoader.java:195) at java.lang.ClassLoader.loadClass(ClassLoader.java:486) at java.lang.ClassLoader.loadClass(ClassLoader.java:419) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:294) at hudson.util.ProcessTree$UnixReflection.(ProcessTree.java:699) at hudson.util.ProcessTree$UnixProcess.kill(ProcessTree.java:647) at hudson.util.ProcessTree$UnixProcess.killRecursively(ProcessTree.java:668) at hudson.util.ProcessTree$Unix.killAll(ProcessTree.java:589) at hudson.Launcher$RemoteLauncher$KillTask.call(Launcher.java:1091) at hudson.Launcher$RemoteLauncher$KillTask.call(Launcher.java:1082) at hudson.remoting.UserRequest.perform(UserRequest.java:181) at hudson.remoting.UserRequest.perform(UserRequest.java:52) at hudson.remoting.Request$2.run(Request.java:336) at hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:68) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1158) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:632) at java.lang.Thread.run(Thread.java:804) Caused: java.lang.LinkageError at hudson.util.ProcessTree$UnixReflection.(ProcessTree.java:710) at hudson.util.ProcessTree$UnixProcess.kill(ProcessTree.java:647) at hudson.util.ProcessTree$UnixProcess.killRecursively(ProcessTree.java:668) at hudson.util.ProcessTree$Unix.killAll(ProcessTree.java:589) at hudson.Launcher$RemoteLauncher$KillTask.call(Launcher.java:1091) at hudson.Launcher$RemoteLauncher$KillTask.call(Launcher.java:1082) at hudson.remoting.UserRequest.perform(UserRequest.java:181) at hudson.remoting.UserRequest.perform(UserRequest.java:52) at hudson.remoting.Request$2.run(Request.java:336) at hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:68) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1158) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:632) at java.lang.Thread.run(Thread.java:804) at ......remote call to docker-4b1f20c42f27(Native Method) at hudson.remoting.Channel.attachCallSiteStackTrace(Channel.java:1554) at hudson.remoting.UserResponse.retrieve(UserRequest.java:281) at hudson.remoting.Channel.call(Channel.java:839) Caused: java.io.IOException: Remote call on docker-4b1f20c42f27 failed at hudson.remoting.Channel.call(Channel.java:847) at hudson.Launcher$RemoteLauncher.kill(Launcher.java:1079) at org.jenkinsci.plugins.xvfb.Xvfb.shutdownAndCleanup(Xvfb.java:327) at org.jenkinsci.plugins.xvfb.XvfbDisposer.tearDown(XvfbDisposer.java:52) at jenkins.tasks.SimpleBuildWrapper$EnvironmentWrapper.tearDown(SimpleBuildWrapper.java:175) at hudson.model.Build$BuildExecution.doRun(Build.java:174) at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:496) at hudson.model.Run.execute(Run.java:1737) at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:43) at hudson.model.ResourceController.execute(ResourceController.java:97) at hudson.model.Executor.run(Executor.java:419) Finished: FAILURE

jbarbera avatar Oct 06 '17 11:10 jbarbera

@jbarbera try to run on java8, i never saw such errors. Maybe they are related to jdk.

KostyaSha avatar Oct 06 '17 11:10 KostyaSha

@KostyaSha , same issue ,, it was working well until i accidentally had deletete the jobs folder on jenkins server , problems started after i created a new single job

jbarbera avatar Oct 09 '17 06:10 jbarbera

java.lang.ClassNotFoundException: Classloading from system classloader disabled

KostyaSha avatar Oct 09 '17 07:10 KostyaSha

We experienced this problem around upgrading to Java 1.8. The error FATAL: Remote call on <HOST> failed woud occur whenever the test process would spawn child processes that weren't successfully cleaned up by the end of test script.

The workaround would be to identify those background processes spawned by test, kill <pid> them at the end of the test script, and sometimes (for good measure) add a sleep N statement to sleep for a few seconds after a kill so that the process would have time to shutdown after receiving the kill signal.

mislav avatar Nov 30 '17 20:11 mislav

@mislav interesting... could you get snapshot of docker ps tree command?

KostyaSha avatar Nov 30 '17 20:11 KostyaSha

@KostyaSha Sorry, we've worked around the problem now and I don't have sample output from docker ps anymore. But the issue ocurred even with child processes unrelated to Docker. Basically the workaround was:

kill <pid> # kill regular child process
docker kill <docker-id> # have docker kill a deamonized process
sleep 2 # allow some time for them to shut down

mislav avatar Dec 01 '17 13:12 mislav

Probably docker should be run with init? but afaik jenkins in the end of build runs childRipper for killing all spawned childs.

KostyaSha avatar Dec 01 '17 14:12 KostyaSha

@mislav re your work around, are those the steps one would add as a post-build hook to sh-exec?

if so, what are you killing?

in my use case, jenkins master is running in a docker container and spawns build jobs that are in turn in containers. i am guessing the above post-build script would exec from the jenkins master container ... and if correct, i need to collect run results so that the job exits successfully.

thx for the work around notes!

jwtodd avatar Dec 01 '17 23:12 jwtodd

In this issue i see 2 different errors, they are all related to remoting, but they are different.

KostyaSha avatar Dec 02 '17 00:12 KostyaSha

@jwtodd

  1. The steps are added to the test script, not the post-build hook.
  2. We run neither Jenkins master nor build jobs in docker containers. Docker is merely used to spawn some processes within the build script.
  3. I'm not sure whether our solution is related to the problem that the OP is describing. We do not use the yet-another-docker-plugin project—I should have noted that earlier. I just posted here because the exception message and stack traces are the same, and because it was hard for us to track down the cause of the failure. I hope it helps someone while debugging their issue.

mislav avatar Dec 04 '17 02:12 mislav

ok ... in our env all infra is running in a container, namely: jenkins and sonarqube

spawned build jobs run in a newly created docker container via YADPlugin ... thx for this btw :)

in u/g to jdk9 for build jobs only we have run into this same issue. interestingly we have 2 such jdk9 based containers, one for a prototypical java9 app build and a second one for nodejs builds ... each baselining from the same container to satisfy jenkins-slave concerns.

now, while we have not done alot with the nodejs build container other then prove it works with a shim/stub build exec sh ... it tears down cleanly whereas the java app with tests and all runs into the error stated here.

odd and a bummer :(

current work around is to run the jdk9 build on the jenkins master (container).

haven't had a chance to diagnose this one further ... but would love to push the job concerns back to build containers, for obvious reasons.

jwtodd avatar Dec 06 '17 01:12 jwtodd

Do you have this project in public? How i can reproduce issue?

KostyaSha avatar Dec 06 '17 01:12 KostyaSha

it isn't :(

i will see about publishing enough reference artifacts to reproduce this.

i did work on https://github.com/intuit/wasabi which shares some operational aspects of my current project ... i will see if that codebase run against my jenkins+YADP fails as well.

jwtodd avatar Dec 06 '17 04:12 jwtodd