Forked task failed to remove serialized task context
Here is my environment:
Host: 3 host Nodes: Two NodeStarter on each host
When I run workflow, I always gain the "Forked task failed to remove serialized task context" error.
After I adjust configure to be " One NodeStarter on each host" , then whole work flow succeed.
Pls guide me on this. Thanks
[2016-02-25 14:12:43,905 INFO o.o.p.s.u.TaskLogger] task 1365t7 (Task172) started on m-dn02(node: SSH-dn02-1) [2016-02-25 14:12:44,686 ERROR o.o.p.s.u.TaskLogger] task 1365t7 (Task172) error org.ow2.proactive.scheduler.task.exceptions.ForkedJvmProcessException: Failed to execute task in a forked JVM at org.ow2.proactive.scheduler.task.executors.ForkedTaskExecutor.createTaskResult(ForkedTaskExecutor.java:164) at org.ow2.proactive.scheduler.task.executors.ForkedTaskExecutor.execute(ForkedTaskExecutor.java:133) at org.ow2.proactive.scheduler.task.TaskLauncher.doTask(TaskLauncher.java:172) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at org.objectweb.proactive.core.mop.MethodCall.execute(MethodCall.java:353) at org.objectweb.proactive.core.body.request.RequestImpl.serveInternal(RequestImpl.java:214) at org.objectweb.proactive.core.body.request.RequestImpl.serve(RequestImpl.java:160) at org.objectweb.proactive.core.body.BodyImpl$ActiveLocalBodyStrategy.serveInternal(BodyImpl.java:552) at org.objectweb.proactive.core.body.BodyImpl$ActiveLocalBodyStrategy.serve(BodyImpl.java:485) at org.objectweb.proactive.core.body.AbstractBody.serve(AbstractBody.java:426) at org.objectweb.proactive.Service.blockingServeOldest(Service.java:206) at org.objectweb.proactive.Service.blockingServeOldest(Service.java:181) at org.objectweb.proactive.Service.fifoServing(Service.java:146) at org.objectweb.proactive.core.body.ActiveBody$FIFORunActive.runActivity(ActiveBody.java:337) at org.objectweb.proactive.core.body.ActiveBody.run(ActiveBody.java:175) at java.lang.Thread.run(Thread.java:745) Caused by: java.io.IOException: Forked task failed to remove serialized task context, probably a permission issue on folder /tmp/PA_JVM210940313/SSH-dn02-1/1365/771415615 ... 18 more [2016-02-25 14:12:44,687 INFO o.o.p.s.u.TaskLogger] task 1365t7 (Task172) finished with errors
Hello,
can you provide more information? 1)What results does your workflow return? 2) What type of languages does your workflow use? 3) How many tasks are inside the workflow and which task is failing? 4) Which language executes the failing task?
First thoughts: "Forked task failed to remove serialized task context, probably a permission issue on folder /tmp/PA_JVM210940313/SSH-dn02-1/1365/771415615" Are you running all the nodes (Nodestarter) with the same user? Are you running with runasme? And the user you run with has no write access to the data which the executing(nodestarter) user creates or vice versa?
An issue with a similar error message has been reported a few days ago : #2468 using the RunAsMe mode
But the scenario you describe seems different and it's very curious that the number of proactive nodes you deploy changes the behavior.
If I understood correctly, you deployed your infrastructure using an SSHInfrastructure (or SSHInfrastructureV2) or did you start ProActive nodes manually on each machine by using the command <scheduling_folder>/bin/proactive-node ?
1)What results does your workflow return? -- Hive job just submit etl task ( java program ) to hadoop cluster, client do not need any more work.
-
What type of languages does your workflow use? -- I use native. bash shell used to start java program
-
How many tasks are inside the workflow and which task is failing? -- one parent job and 10 sub jobs in parallel. error task is totally random.
-
Which language executes the failing task? -- both use the native which to call java program .
-
Deployed your infrastructure using an SSHInfrastructure -- yes. both NodeStarter start as the same user. By the way, not click the RunAsMe mode.