bunny icon indicating copy to clipboard operation
bunny copied to clipboard

TES backend doesn't seem to support the CWL allowed exit codes

Open kmhernan opened this issue 7 years ago • 12 comments

When using bunny + TES a task that has a non-zero exit code, but is considered acceptable by the CWL spec still is interpreted as failed by the rabix engine. I have tested the same workflow with the the local execution backend and it runs as expected. While the TES may return the error state, the exit code is stored in the TES TaskLog message and should be used in this case to override the error state provided by the TES backend.

kmhernan avatar Jan 11 '18 21:01 kmhernan

Are the output files uploaded when the TES task errors-out?

milos-ljubinkovic avatar Jan 11 '18 23:01 milos-ljubinkovic

Are the output files uploaded when the TES task errors-out?

Ah, I hadn't thought about this.

Currently, Funnel will stop processing on the first failed executor, and will not upload output files. We've discussed changing this behavior, in order to provide a sort of "best effort" behavior, where Funnel tries to get you all the data it has. In other words, we could try make Funnel upload any outputs it can find. There are some details to iron out there though. Currently it's an error if an output isn't found, which wouldn't be true in this situation.

buchanae avatar Jan 11 '18 23:01 buchanae

Bunny could wrap those tools with defined successCodes into a command that always exits with a 0 but stores the actual exit code somewhere and then evaluates the success state in the postprocess stage. This makes sense as it's a cwl feature so it should work independently of funnel's support for it.

milos-ljubinkovic avatar Jan 12 '18 11:01 milos-ljubinkovic

Yeah I was using shared FS in these tests when I noticed it but no the output wasn’t copied/linked over to the bunny directory structure from what I can see.

kmhernan avatar Jan 12 '18 12:01 kmhernan

I've made some quick changes on this branch: https://github.com/rabix/bunny/tree/tes/exitcodes

If there are allowed exit codes in the app the exit code is saved and overridden to 0 inside TES and then independently evaluated after execution.

Changed the way command line is built to accommodate this so some side effects with weird command lines might happen.

milos-ljubinkovic avatar Jan 12 '18 13:01 milos-ljubinkovic

Awesome @milos-ljubinkovic ... my quick peek at the source suggests that this branch also supports the newer TES spec correct? Since I'm testing with the newer funnel versions that have the newer TES spec, I have had to edit the source from older rabix versions... just double checking so I can test it with my workflow.

kmhernan avatar Jan 12 '18 14:01 kmhernan

It should support the latest TES spec and was tested against funnel's master branch on 10th January I think. Some issues with s3 and endpoints were reported, though.

milos-ljubinkovic avatar Jan 12 '18 14:01 milos-ljubinkovic

great... yeah we gave up on s3 for now and testing with ceph FS... will test this today thanks

kmhernan avatar Jan 12 '18 14:01 kmhernan

We are tracking this issue in Funnel https://github.com/ohsu-comp-bio/funnel/issues/425

adamstruck avatar Jan 12 '18 17:01 adamstruck

@milos-ljubinkovic it seems like I can't get around this exception with this branch:

java.lang.IllegalArgumentException: Illegal character in scheme name at index 0: {
  "appFileLocation" : "/mnt/cephfs/cwls/jeremiah/gdc-dnaseq-cwl/workflows/dnaseq/metrics.cwl",
  "successCodes" : [ ],
  "cwlVersion" : "v1.0",
  "inputs" : [ {
    "id" : "bam",
    "type" : "File",
    "scatter" : true
  }, {
    "id" : "known_snp",
    "type" : "File"
  }, {

It happens on both local and TES backends.

kmhernan avatar Jan 13 '18 01:01 kmhernan

Made a quick revert on that branch that had something to do with ignoring IllegalArgumentExceptions, so it might help but didn't really reproduce the issue. Could you upload your workflow or the full stack trace?

milos-ljubinkovic avatar Jan 13 '18 13:01 milos-ljubinkovic

@milos-ljubinkovic I think that's where the issue is, here more of the stack trace I can easily grep out... I'm running again with your changes right now.

java.lang.IllegalArgumentException: Illegal character in scheme name at index 0: {
	at java.net.URI.create(URI.java:852) ~[na:1.8.0_141]
	at org.rabix.bindings.cwl.resolver.CWLDocumentResolver.resolve(CWLDocumentResolver.java:100) ~[rabix-cli.jar:na]
	at org.rabix.bindings.cwl.helper.CWLJobHelper.getCWLJob(CWLJobHelper.java:20) ~[rabix-cli.jar:na]
	at org.rabix.bindings.cwl.CWLProcessor.transformInputs(CWLProcessor.java:519) ~[rabix-cli.jar:na]
	at org.rabix.bindings.cwl.CWLBindings.transformInputs(CWLBindings.java:175) ~[rabix-cli.jar:na]
	at org.rabix.engine.processor.handler.impl.JobStatusEventHandler.handleTransform(JobStatusEventHandler.java:356) ~[rabix-cli.jar:na]
	at org.rabix.engine.processor.handler.impl.JobStatusEventHandler.ready(JobStatusEventHandler.java:289) ~[rabix-cli.jar:na]
	at org.rabix.engine.processor.handler.impl.JobStatusEventHandler.handle(JobStatusEventHandler.java:109) ~[rabix-cli.jar:na]
	at org.rabix.engine.processor.handler.impl.JobStatusEventHandler.handle(JobStatusEventHandler.java:43) ~[rabix-cli.jar:na]
	at org.rabix.engine.processor.impl.EventProcessorImpl.send(EventProcessorImpl.java:210) [rabix-cli.jar:na]
	at org.rabix.engine.processor.impl.MultiEventProcessorImpl.send(MultiEventProcessorImpl.java:59) ~[rabix-cli.jar:na]
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[na:1.8.0_141]
	at java.lang.reflect.Method.invoke(Method.java:498) ~[na:1.8.0_141]
	at com.google.inject.internal.DelegatingInvocationHandler.invoke(DelegatingInvocationHandler.java:50) ~[rabix-cli.jar:na]
	at org.rabix.engine.processor.handler.impl.InputEventHandler.handle(InputEventHandler.java:99) ~[rabix-cli.jar:na]
	at org.rabix.engine.processor.handler.impl.InputEventHandler.handle(InputEventHandler.java:27) ~[rabix-cli.jar:na]
	at org.rabix.engine.processor.impl.EventProcessorImpl.send(EventProcessorImpl.java:210) [rabix-cli.jar:na]
	at org.rabix.engine.processor.impl.MultiEventProcessorImpl.send(MultiEventProcessorImpl.java:59) ~[rabix-cli.jar:na]
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[na:1.8.0_141]
	at java.lang.reflect.Method.invoke(Method.java:498) ~[na:1.8.0_141]
	at com.google.inject.internal.DelegatingInvocationHandler.invoke(DelegatingInvocationHandler.java:50) ~[rabix-cli.jar:na]
	at org.rabix.engine.processor.handler.impl.ScatterHandler.createScatteredJobs(ScatterHandler.java:222) ~[rabix-cli.jar:na]
	at org.rabix.engine.processor.handler.impl.ScatterHandler.scatterPort(ScatterHandler.java:115) ~[rabix-cli.jar:na]
	at org.rabix.engine.processor.handler.impl.JobStatusEventHandler.ready(JobStatusEventHandler.java:277) ~[rabix-cli.jar:na]
	at org.rabix.engine.processor.handler.impl.JobStatusEventHandler.handle(JobStatusEventHandler.java:109) ~[rabix-cli.jar:na]
	at org.rabix.engine.processor.handler.impl.JobStatusEventHandler.handle(JobStatusEventHandler.java:43) ~[rabix-cli.jar:na]
	at org.rabix.engine.processor.impl.EventProcessorImpl.send(EventProcessorImpl.java:210) [rabix-cli.jar:na]
	at org.rabix.engine.processor.impl.MultiEventProcessorImpl.send(MultiEventProcessorImpl.java:59) ~[rabix-cli.jar:na]
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[na:1.8.0_141]
	at java.lang.reflect.Method.invoke(Method.java:498) ~[na:1.8.0_141]
	at com.google.inject.internal.DelegatingInvocationHandler.invoke(DelegatingInvocationHandler.java:50) ~[rabix-cli.jar:na]
	at org.rabix.engine.processor.handler.impl.InputEventHandler.handle(InputEventHandler.java:99) ~[rabix-cli.jar:na]
	at org.rabix.engine.processor.handler.impl.InputEventHandler.handle(InputEventHandler.java:27) ~[rabix-cli.jar:na]
	at org.rabix.engine.processor.impl.EventProcessorImpl.send(EventProcessorImpl.java:210) [rabix-cli.jar:na]
	at org.rabix.engine.processor.impl.MultiEventProcessorImpl.send(MultiEventProcessorImpl.java:59) ~[rabix-cli.jar:na]
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[na:1.8.0_141]
	at java.lang.reflect.Method.invoke(Method.java:498) ~[na:1.8.0_141]
	at com.google.inject.internal.DelegatingInvocationHandler.invoke(DelegatingInvocationHandler.java:50) ~[rabix-cli.jar:na]
	at org.rabix.engine.processor.handler.impl.OutputEventHandler.handle(OutputEventHandler.java:112) ~[rabix-cli.jar:na]
	at org.rabix.engine.processor.handler.impl.OutputEventHandler.handle(OutputEventHandler.java:34) ~[rabix-cli.jar:na]
	at org.rabix.engine.processor.impl.EventProcessorImpl.send(EventProcessorImpl.java:210) [rabix-cli.jar:na]
	at org.rabix.engine.processor.impl.MultiEventProcessorImpl.send(MultiEventProcessorImpl.java:59) ~[rabix-cli.jar:na]
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[na:1.8.0_141]
	at java.lang.reflect.Method.invoke(Method.java:498) ~[na:1.8.0_141]
	at com.google.inject.internal.DelegatingInvocationHandler.invoke(DelegatingInvocationHandler.java:50) ~[rabix-cli.jar:na]
	at org.rabix.engine.processor.handler.impl.OutputEventHandler.handle(OutputEventHandler.java:112) ~[rabix-cli.jar:na]
	at org.rabix.engine.processor.handler.impl.OutputEventHandler.handle(OutputEventHandler.java:34) ~[rabix-cli.jar:na]
	at org.rabix.engine.processor.impl.EventProcessorImpl.send(EventProcessorImpl.java:210) [rabix-cli.jar:na]
	at org.rabix.engine.processor.impl.MultiEventProcessorImpl.send(MultiEventProcessorImpl.java:59) ~[rabix-cli.jar:na]
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[na:1.8.0_141]
	at java.lang.reflect.Method.invoke(Method.java:498) ~[na:1.8.0_141]
	at com.google.inject.internal.DelegatingInvocationHandler.invoke(DelegatingInvocationHandler.java:50) ~[rabix-cli.jar:na]
	at org.rabix.engine.processor.handler.impl.JobStatusEventHandler.handle(JobStatusEventHandler.java:160) ~[rabix-cli.jar:na]
	at org.rabix.engine.processor.handler.impl.JobStatusEventHandler.handle(JobStatusEventHandler.java:43) ~[rabix-cli.jar:na]
	at org.rabix.engine.processor.impl.EventProcessorImpl.handle(EventProcessorImpl.java:175) [rabix-cli.jar:na]
	at org.rabix.engine.processor.impl.EventProcessorImpl.lambda$doProcessEvent$3(EventProcessorImpl.java:108) [rabix-cli.jar:na]
	at org.rabix.engine.store.memory.InMemoryRepositoryRegistry.doInTransaction(InMemoryRepositoryRegistry.java:92) ~[rabix-cli.jar:na]
	at org.rabix.engine.processor.impl.EventProcessorImpl.doProcessEvent(EventProcessorImpl.java:107) [rabix-cli.jar:na]
	at org.rabix.engine.processor.impl.EventProcessorImpl.lambda$null$1(EventProcessorImpl.java:91) [rabix-cli.jar:na]
	at org.rabix.engine.metrics.impl.MetricsHelperImpl.time(MetricsHelperImpl.java:78) ~[rabix-cli.jar:na]
	at org.rabix.engine.processor.impl.EventProcessorImpl.lambda$start$2(EventProcessorImpl.java:91) [rabix-cli.jar:na]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[na:1.8.0_141]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[na:1.8.0_141]
	at java.lang.Thread.run(Thread.java:748) ~[na:1.8.0_141]
Caused by: java.net.URISyntaxException: Illegal character in scheme name at index 0: {
	at java.net.URI$Parser.fail(URI.java:2848) ~[na:1.8.0_141]
	at java.net.URI$Parser.checkChars(URI.java:3021) ~[na:1.8.0_141]
	at java.net.URI$Parser.checkChar(URI.java:3031) ~[na:1.8.0_141]
	at java.net.URI$Parser.parse(URI.java:3047) ~[na:1.8.0_141]
	at java.net.URI.<init>(URI.java:588) ~[na:1.8.0_141]
	at java.net.URI.create(URI.java:850) ~[na:1.8.0_141]

kmhernan avatar Jan 15 '18 17:01 kmhernan