LJA
LJA copied to clipboard
Crash on dinucleotede error correction
I also have seen this error a few times, but it seems pretty nondeterministic. For example this run crashed
04:32:08 73.9Gb INFO: Applying changes to the graph
05:15:48 105Gb INFO: Collecting and storing read suffixes
05:36:48 108.3Gb INFO: Correcting dinucleotide errors in reads
Child process crashed
while this one made it past (currently still running)
02:54:37 70.8Gb INFO: Applying changes to the graph
03:34:20 102.6Gb INFO: Collecting and storing read suffixes
03:51:56 108.8Gb INFO: Correcting dinucleotide errors in reads
05:50:43 108.8Gb INFO: Applying corrections to reads
06:00:06 109.2Gb INFO: Applied correction to 302606 reads
06:00:07 109.2Gb INFO: Corrected 302606 dinucleotide sequences
06:00:07 109.2Gb INFO: Marking reliable edges
06:00:29 109.2Gb INFO: Marked 1017912 edges in 248165 paths as reliable
06:00:30 109.2Gb INFO: Correcting low covered regions in reads with K = 800
08:31:43 110.4Gb INFO: Applying corrections to reads
08:59:50 111.5Gb INFO: Applied correction to 982484 reads
08:59:50 111.5Gb INFO: Corrected low covered regions in 982484 reads with K = 800
08:59:50 111.5Gb INFO: Applying changes to the graph
09:43:54 137.9Gb INFO: Marking reliable edges
09:44:02 137.9Gb INFO: Marked 116356 edges in 37311 paths as reliable
09:44:02 137.9Gb INFO: Correcting low covered regions in reads with K = 2000
11:43:52 137.9Gb INFO: Applying corrections to reads
11:57:48 137.9Gb INFO: Applied correction to 101111 reads
11:57:49 137.9Gb INFO: Corrected low covered regions in 101111 reads with K = 2000
11:57:49 137.9Gb INFO: Applying changes to the graph
12:53:23 156.8Gb INFO: Correcting dinucleotide errors in reads
I've seen this a few times where literally re-running the same command will sometimes crash with the Child process crashed
error and at different times. This is on a HPC, so could be different nodes with different CPUs etc. I haven't ever made it past the first error correction (either due to crash or a 24 hour wall limit), so am hoping the current 120h job will make it further. It is trio binned data for this sample, so I can share the fastq if wanted.
Originally posted by @ASLeonard in https://github.com/AntonBankevich/LJA/issues/14#issuecomment-1058939752
As an update, the longer job did finish and reached the end of LJA. I was checking the two logs, and they were identical up to the point of the crash, so nothing obvious why it would crash some times and finish fine on others. They did run on different nodes but both should be able to handle the same CPU instructions, so unlikely that was the cause.