cromwell icon indicating copy to clipboard operation
cromwell copied to clipboard

Occasional "Failed to read_lines" with IOException

Open karlkashofer opened this issue 1 year ago • 4 comments

We run custom WDL genomics pipelines and also parts of the BioWDL somaticcalling pipeline (https://biowdl.github.io/) using the broadinstitute/cromwell:64 cromwell container with a gridengine backend.

Occasionally we see errors like the below:

cromwell_1 | 2023-03-20 14:56:42,590 cromwell-system-akka.dispatchers.engine-dispatcher-193 INFO - WorkflowManagerActor: Workflow 3f4aa8de-d1a9-419a-b9b4-10f9ed0a9d53 failed (during ExecutingWorkflowState): cromwell.backend.standard.StandardAsyncExecutionActor$$anon$2: Failed to evaluate job outputs: cromwell_1 | Bad output 'scatterList.scatters': Failed to read_lines("/home/devarea/karl/PathoCromwell/cromwell-executions/Agilent_Exome_Single/3f4aa8de-d1a9-419a-b9b4-10f9ed0a9d53/call-SomaticVariants/SomaticVariantcalling/0c6cefa3-4c84-497f-8003-b0d7221bedbc/call-mutect2/Mutect2/fb66e417-4c06-4f15-8607-da8261e16448/call-scatterList/execution/stdout") (reason 1 of 1): [Attempted 1 time(s)] - IOException: Could not read from /home/devarea/karl/PathoCromwell/cromwell-executions/Agilent_Exome_Single/3f4aa8de-d1a9-419a-b9b4-10f9ed0a9d53/call-SomaticVariants/SomaticVariantcalling/0c6cefa3-4c84-497f-8003-b0d7221bedbc/call-mutect2/Mutect2/fb66e417-4c06-4f15-8607-da8261e16448/call-scatterList/execution/stdout: /home/devarea/karl/PathoCromwell/cromwell-executions/Agilent_Exome_Single/3f4aa8de-d1a9-419a-b9b4-10f9ed0a9d53/call-SomaticVariants/SomaticVariantcalling/0c6cefa3-4c84-497f-8003-b0d7221bedbc/call-mutect2/Mutect2/fb66e417-4c06-4f15-8607-da8261e16448/call-scatterList/execution/stdout cromwell_1 | at cromwell.backend.standard.StandardAsyncExecutionActor.$anonfun$handleExecutionSuccess$1(StandardAsyncExecutionActor.scala:973) cromwell_1 | at scala.util.Success.$anonfun$map$1(Try.scala:255) cromwell_1 | at scala.util.Success.map(Try.scala:213) cromwell_1 | at scala.concurrent.Future.$anonfun$map$1(Future.scala:292) cromwell_1 | at scala.concurrent.impl.Promise.liftedTree1$1(Promise.scala:33) cromwell_1 | at scala.concurrent.impl.Promise.$anonfun$transform$1(Promise.scala:33) cromwell_1 | at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:64) cromwell_1 | at akka.dispatch.BatchingExecutor$AbstractBatch.processBatch(BatchingExecutor.scala:55) cromwell_1 | at akka.dispatch.BatchingExecutor$BlockableBatch.$anonfun$run$1(BatchingExecutor.scala:92)

Basically cromwell fails to read a stdout file written by a job. When we check the file it exists and contains data so i suspect some kind of IO problem. All Data is on NFS shares which are usually quite stable and we see no errors in any of the filesystem/nfs backends. It seems to mostly happen in this scatter step, so i suspect some race condition or timeout somewhere in there, however, this job was creating a scatter to 12 bed files, so its really not that big . Just re-running the job usually works, but its extremely annoying.

Call caching is disabled: call-caching { enabled = false }

Any idea what to do ?

karlkashofer avatar Mar 21 '23 08:03 karlkashofer

The reason for this problem may be that in the nfs mode, the writing of ’stdout‘ file is asynchronous; after the ’script‘ script is executed, the ‘rc’ file is scanned by cromwell akka, and the ‘stdout’ file may still be writing data at this time (memory data writing to disk), so this problem occurs;

Work around solution: After the ’script‘ script is executed, move the ‘rc’ file to another directory, sleep for a few seconds (the specific time should be adjusted according to the nfs network situation) and then move the rc file to the cromwell standard directory

image

honestAnt avatar May 22 '23 05:05 honestAnt

I had the same problem. The code that generates stdout and stderr is included in StandardAsyncExecutionActor.scala:

    // The `tee` trickery below is to be able to redirect to known filenames for CWL while also streaming
    // stdout and stderr for PAPI to periodically upload to cloud storage.
    // https://stackoverflow.com/questions/692000/how-do-i-write-stderr-to-a-file-while-using-tee-with-a-pipe
    (errorOrDirectoryOutputs, errorOrGlobFiles).mapN((directoryOutputs, globFiles) =>
    s"""|#!$jobShell
        |DOCKER_OUTPUT_DIR_LINK
        |cd ${cwd.pathAsString}
        |tmpDir=$temporaryDirectory
        |$tmpDirPermissionsAdjustment
        |export _JAVA_OPTIONS=-Djava.io.tmpdir="$$tmpDir"
        |export TMPDIR="$$tmpDir"
        |export HOME="$home"
        |(
        |cd ${cwd.pathAsString}
        |SCRIPT_PREAMBLE
        |)
        |$out="$${tmpDir}/out.$$$$" $err="$${tmpDir}/err.$$$$"
        |mkfifo "$$$out" "$$$err"
        |trap 'rm "$$$out" "$$$err"' EXIT
        |touch $stdoutRedirection $stderrRedirection
        |tee $stdoutRedirection < "$$$out" &
        |tee $stderrRedirection < "$$$err" >&2 &
        |(
        |cd ${cwd.pathAsString}
        |ENVIRONMENT_VARIABLES
        |INSTANTIATED_COMMAND
        |) $stdinRedirection > "$$$out" 2> "$$$err"
        |echo $$? > $rcTmpPath
        |$emptyDirectoryFillCommand
        |(
        |cd ${cwd.pathAsString}
        |SCRIPT_EPILOGUE
        |${globScripts(globFiles)}
        |${directoryScripts(directoryOutputs)}
        |)
        |mv $rcTmpPath $rcPath
        |""".stripMargin
      .replace("SCRIPT_PREAMBLE", scriptPreamble)
      .replace("ENVIRONMENT_VARIABLES", environmentVariables)
      .replace("INSTANTIATED_COMMAND", commandString)
      .replace("SCRIPT_EPILOGUE", scriptEpilogue)
      .replace("DOCKER_OUTPUT_DIR_LINK", dockerOutputDir))
  }

With SCRIPT_EPILOGUE set to default to sync and modifiable with the script-epilogue variable in the configuration (this is not explained in the Cromwell documentation but it is explained here). Maybe the problem could have been solved by also replacing $stdoutRedirection and $stderrRedirection with something like $stdoutRedirectionTmp and $stderrRedirectionTmp and then replace:

mv $rcTmpPath $rcPath

with:

mv $stdoutRedirectionTmp $stdoutRedirection
mv $stderrRedirectionTmp $stderrRedirection
mv $rcTmpPath $rcPath

This way stdout and stderr would have been created in the NFS filesystem at the same time as the rc file and would increase the likelihood that they would all have been synced at the same time. However, this would not give the intended behavior when running in Google Cloud

Another problem that I have noticed is that there are multiple places in the Cromwell documentation that advise, when running Cromwell with SLURM, to use configurations such as (see here):

sbatch \
  -o ${cwd}/execution/stdout \
  -e ${cwd}/execution/stderr \

or as (see here, or here, or here, or here)

sbatch \
  -o ${out} \
  -e ${err} \

which overwrites stdout and stderr written by the script file, which seems inappropriate. The following should have been used instead:

sbatch \
  -o ${out}.sbatch \
  -e ${err}.sbatch \

Similarly to how it is advised for SGE where ${out}.qsub and ${err}.qsub are used in place of ${out} and ${err}

The current workaround suggested by @honestAnt is instead to use in the Cromwell configuration file something like this:

submit-docker = """
...
sbatch \
  --wait \
  -J=${job_name} \
  -D ${cwd} \
  -o ${out}.sbatch \
  -e ${err}.sbatch \
  -t ${runtime_minutes} \
  -c ${cpu} \
  --mem=${memory_mb} \
  --wrap "singularity exec --containall --bind ${cwd}:${docker_cwd} docker://${docker} ${job_shell} ${docker_script}"
mv ${cwd}/execution/rc ${cwd}/execution/rc.tmp
sleep 60
mv ${cwd}/execution/rc.tmp ${cwd}/execution/rc
"""

A better alternative would be to use in the Cromwell configuration file something like this (as suggested here):

script-epilogue = "sleep 60 && sync"

submit-docker = """
...
sbatch \
  --wait \
  -J=${job_name} \
  -D ${cwd} \
  -o ${out}.sbatch \
  -e ${err}.sbatch \
  -t ${runtime_minutes} \
  -c ${cpu} \
  --mem=${memory_mb} \
  --wrap "singularity exec --containall --bind ${cwd}:${docker_cwd} docker://${docker} ${job_shell} ${docker_script}"
"""

But both options are way less than ideal and the choice of 60 might not be sufficient for all NFS configurations. If anybody wants to try to trigger this issue on an NFS shared filesystem setup, the following WDL should do the trick:

version 1.0

workflow main {
  scatter (idx in range(256)) {
    call main {
      input:
        i = idx
    }
  }
  output { Array[Int] n = main.n }
}

task main {
  input {
    Int i
  }

  command <<<
    set -euo pipefail
    echo ~{i*i}
  >>>

  output {
    Int n = read_int(stdout())
  }

  runtime {
    docker: "debian:stable-slim"
  }
}

freeseek avatar Jul 05 '23 16:07 freeseek

After being able to do some testing with a collaborator that is able to reproduce the problem, I was able to gather that:

  • script-epilogue = "sleep 5 && sync" worked
  • script-epilogue = "ls -l stdout stderr && sync" worked
  • script-epilogue = "ls && sync" failed

with the ls -l suggestion coming from here. I am guessing here that tee might not be writing to stdout/stderr fast enough and by the time the sync command (which is what script-epilogue is set to default) is run, the stdout/stderr file might still be incomplete. So if I understand this correctly, while this is related to an NFS synchronization problem, it is not strictly an NFS synchronization problem but rather a synchronization problem between tee and sync

freeseek avatar Jul 07 '23 19:07 freeseek

tee will cache the standard output of the program into a buffer, after using sync, it will brush the data from the buffer to disk, after that, nfs will synchronize to the remote service (nfs itself has a delay of a few seconds); so no matter whether it is tee and sync, or nfs there may be a problem, the best thing is to turn therc.tmp to rc file operation to give a delay of a few seconds is to do it in the case of not changing the source code. It's a good idea to delay the rc.tmp to rc file operation for a few seconds without changing the source code.

honestAnt avatar Sep 08 '23 02:09 honestAnt