gatk-sv
gatk-sv copied to clipboard
ScramblePart1 139 error code
Bug Report
Affected module(s) or script(s)
wdl/GatherSampleEvidence
Affected version(s)
This codebase as-of this commit
Description
I've seen a couple of failing jobs recently where ScramblePart1 is terminating with a 139 error code. It looks like this is a kill signal from the OS/Hypervisor when the tool tries to access memory it has no permission to use. This is always coupled with a Wham failure - is the 139 error likely to be meaningful, or is this potentially a kill signal as a separate part of the workflow had failed, and so all jobs needed to be stopped? n.b. these samples had been running for an entire week on this variant calling stage, which typically takes only a few hours, though the trace below is from a re-run, which picked up the prior run's results. e.g.
Jobs:
[92m[#] LocalizeReads (26s)
Call caching: true[0m
[91m[!] Whamg (5h:19m:10s)
stdout: None
stderr: None
rc: None
error: Workflow failed, caused by: Task Whamg.RunWhamgOnCram:NA:2 failed. Job exit code 137. (...)
[92m[#] CollectCounts (18s)
Call caching: true[0m
[92m[#] Manta (33s)[0m
[92m[#] CollectSVEvidence (26s)[0m
[91m[!] Scramble (1h:0m:57s)
stdout: None
stderr: None
rc: None
error: Workflow failed, caused by: Job Scramble.ScramblePart1:NA:2 exited with return code 139 which has not been declared as a valid return code