gatk-sv icon indicating copy to clipboard operation
gatk-sv copied to clipboard

ScramblePart1 139 error code

Open MattWellie opened this issue 4 months ago • 1 comments

Bug Report

Affected module(s) or script(s)

wdl/GatherSampleEvidence

Affected version(s)

This codebase as-of this commit

Description

I've seen a couple of failing jobs recently where ScramblePart1 is terminating with a 139 error code. It looks like this is a kill signal from the OS/Hypervisor when the tool tries to access memory it has no permission to use. This is always coupled with a Wham failure - is the 139 error likely to be meaningful, or is this potentially a kill signal as a separate part of the workflow had failed, and so all jobs needed to be stopped? n.b. these samples had been running for an entire week on this variant calling stage, which typically takes only a few hours, though the trace below is from a re-run, which picked up the prior run's results. e.g.

Jobs:
  [92m[#] LocalizeReads (26s)
    Call caching: true[0m
  [91m[!] Whamg (5h:19m:10s)
    stdout: None
    stderr: None
    rc: None
    error: Workflow failed, caused by: Task Whamg.RunWhamgOnCram:NA:2 failed. Job exit code 137. (...)
  [92m[#] CollectCounts (18s)
    Call caching: true[0m
  [92m[#] Manta (33s)[0m
  [92m[#] CollectSVEvidence (26s)[0m
  [91m[!] Scramble (1h:0m:57s)
    stdout: None
    stderr: None
    rc: None
    error: Workflow failed, caused by: Job Scramble.ScramblePart1:NA:2 exited with return code 139 which has not been declared as a valid return code

MattWellie avatar Mar 06 '24 00:03 MattWellie