nextflow icon indicating copy to clipboard operation
nextflow copied to clipboard

Google Batch should comply with --Error is ignored flag

Open zyosufzai opened this issue 1 year ago • 7 comments

Bug report

Google Batch stops running even though nextflow has supplied a --Error is ignored flag

Expected behavior and actual behavior

Google Batch should continue running even if one of the jobs fails if it has been told to ignore the error. But what actually happens is the workflow terminates with a error message (see below)

Steps to reproduce the problem

Input the following on the command line:

NXF_VER="22.08.1-edge" ./nextflow run nf-core/methylseq -r 1.6.1 -c nextflow.config -profile test,gbatch

with the following config file:

gbatch{ 
      process.executor = 'google-batch' 
      process.machineType = 'n2-standard-16' 
      process.time = '2h' 
      workDir = 'gs://nextflowdemobucket/zy-test/testrna_gbatch_tmp' 
      google.location = 'us-central1' 
      google.region  = 'us-central1' 
      google.project = ''
      params.outdir = 'gs://nextflowdemobucket/zy-test/testrna_gbatch'
      google.batch.bootDiskSize = 100.GB
      }

Program output

[2e/9436f4] NOTE: Process preseq (SRR389222_sub1) terminated with an error exit status (139) -- Error is ignored
Error executing process > 'preseq (SRR389222_sub2)'
Caused by:
  Task java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask@7426a203[Not completed, task = java.util.concurrent.Executors$RunnableAdapter@6126b8c[Wrapped task = TrustedListenableFutureTask@d542ca6[status=PENDING, info=[task=[running=[NOT STARTED YET], com.google.api.gax.rpc.AttemptCallable@464ab2c1]]]]] rejected from java.util.concurrent.ScheduledThreadPoolExecutor@61fde900[Terminated, pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 0]

Environment

  • Nextflow version: 22.08.1-edge
  • Java version: 17.0.3-internal
  • Operating system: Linux
  • Bash version: 5.0.3(1)-release

Additional context

(Add any other context about the problem here)

zyosufzai avatar Aug 31 '22 15:08 zyosufzai

Can you provide the full log? There are two different tasks, SRR389222_sub1 and SRR389222_sub2, and the first one is ignored but the second one is triggering workflow termination. Need to see what happened with the second one.

bentsherman avatar Aug 31 '22 17:08 bentsherman

Since the log are broken up by tags I have the log file for the tag [2e/9436f4] and also a trace back (attatched) zy-test_testm_gbatch_tmp_2e_9436f428d4c0b830b66ed8bc37c994_.command.log batch-trace.txt

zyosufzai avatar Aug 31 '22 19:08 zyosufzai

Sorry the first one log file I added was for SRR389222_sub1. This is the second log file for SRR389222_sub2. zy-test_testm_gbatch_tmp_bf_62b9e1e10833042281259ba9560df5_.command.log

zyosufzai avatar Aug 31 '22 20:08 zyosufzai

It would be great if you could upload the .nextflow.log file of the failed execution

pditommaso avatar Aug 31 '22 20:08 pditommaso

gotcha I couldn't find the location of the log file of that session but I reran the pipeline and directed the file to a known location. It has the same errors with the same tasks. nextflow.log

zyosufzai avatar Aug 31 '22 20:08 zyosufzai

That's a weird error. It looks reported by a thread pool used by the Google SDK

Aug-31 20:35:44.041 [Task monitor] ERROR nextflow.processor.TaskProcessor - Error executing process > 'preseq (SRR389222_sub2)'

Caused by:
  Task java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask@49b3a025[Not completed, task = java.util.concurrent.Executors$RunnableAdapter@2e0ceb8c[Wrapped task = TrustedListenableFutureTask@25c1396d[status=PENDING, info=[task=[running=[NOT STARTED YET], com.google.api.gax.rpc.AttemptCallable@2db57b9a]]]]] rejected from java.util.concurrent.ScheduledThreadPoolExecutor@aa6214[Terminated, pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 0]

java.util.concurrent.RejectedExecutionException: Task java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask@49b3a025[Not completed, task = java.util.concurrent.Executors$RunnableAdapter@2e0ceb8c[Wrapped task = TrustedListenableFutureTask@25c1396d[status=PENDING, info=[task=[running=[NOT STARTED YET], com.google.api.gax.rpc.AttemptCallable@2db57b9a]]]]] rejected from java.util.concurrent.ScheduledThreadPoolExecutor@aa6214[Terminated, pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 0]
	at java.base/java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2065)
	at java.base/java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:833)
	at java.base/java.util.concurrent.ScheduledThreadPoolExecutor.delayedExecute(ScheduledThreadPoolExecutor.java:340)
	at java.base/java.util.concurrent.ScheduledThreadPoolExecutor.schedule(ScheduledThreadPoolExecutor.java:562)
	at com.google.common.util.concurrent.MoreExecutors$ScheduledListeningDecorator.schedule(MoreExecutors.java:663)
	at com.google.api.gax.retrying.ScheduledRetryingExecutor.submit(ScheduledRetryingExecutor.java:116)
	at com.google.api.gax.retrying.CallbackChainRetryingFuture$AttemptCompletionListener.handle(CallbackChainRetryingFuture.java:137)
	at com.google.api.gax.retrying.CallbackChainRetryingFuture$AttemptCompletionListener.run(CallbackChainRetryingFuture.java:117)
	at com.google.common.util.concurrent.DirectExecutor.execute(DirectExecutor.java:31)
	at com.google.common.util.concurrent.AbstractFuture.executeListener(AbstractFuture.java:1270)
	at com.google.common.util.concurrent.AbstractFuture.complete(AbstractFuture.java:1038)
	at com.google.common.util.concurrent.AbstractFuture.setException(AbstractFuture.java:808)
	at com.google.api.core.AbstractApiFuture$InternalSettableFuture.setException(AbstractApiFuture.java:94)
	at com.google.api.core.AbstractApiFuture.setException(AbstractApiFuture.java:76)
	at com.google.api.gax.grpc.GrpcExceptionCallable$ExceptionTransformingFuture.onFailure(GrpcExceptionCallable.java:97)
	at com.google.api.core.ApiFutures$1.onFailure(ApiFutures.java:67)
	at com.google.common.util.concurrent.Futures$CallbackListener.run(Futures.java:1132)
	at com.google.common.util.concurrent.DirectExecutor.execute(DirectExecutor.java:31)
	at com.google.common.util.concurrent.AbstractFuture.executeListener(AbstractFuture.java:1270)
	at com.google.common.util.concurrent.AbstractFuture.complete(AbstractFuture.java:1038)
	at com.google.common.util.concurrent.AbstractFuture.setException(AbstractFuture.java:808)
	at io.grpc.stub.ClientCalls$GrpcFuture.setException(ClientCalls.java:572)
	at io.grpc.stub.ClientCalls$UnaryStreamToFuture.onClose(ClientCalls.java:542)
	at io.grpc.PartialForwardingClientCallListener.onClose(PartialForwardingClientCallListener.java:39)
	at io.grpc.ForwardingClientCallListener.onClose(ForwardingClientCallListener.java:23)
	at io.grpc.ForwardingClientCallListener$SimpleForwardingClientCallListener.onClose(ForwardingClientCallListener.java:40)
	at com.google.api.gax.grpc.ChannelPool$ReleasingClientCall$1.onClose(ChannelPool.java:535)
	at io.grpc.internal.ClientCallImpl.closeObserver(ClientCallImpl.java:562)
	at io.grpc.internal.ClientCallImpl.access$300(ClientCallImpl.java:70)
	at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1StreamClosed.runInternal(ClientCallImpl.java:743)
	at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1StreamClosed.runInContext(ClientCallImpl.java:722)
	at io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37)
	at io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:133)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
	at java.base/java.lang.Thread.run(Thread.java:833)

pditommaso avatar Sep 01 '22 14:09 pditommaso

So could it be that because preseq of SRR389222_sub1 and 2 failed and caused termination because the pool thread was not able to reuse the previously created threads to execute new requests? I'm wondering if the 'Ignore Errors' flag doesnt comply with google batch because it needs its own exception rule written in its JSON file? Looking at the documentations in the link below I wonder if there needs to be a "ignore_exit_status" : https://cloud.google.com/batch/docs/reference/rpc/google.cloud.batch.v1

zyosufzai avatar Sep 01 '22 21:09 zyosufzai

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale[bot] avatar Mar 18 '23 10:03 stale[bot]

Closing this in favour of #3772

pditommaso avatar Mar 19 '23 09:03 pditommaso