LJA
LJA copied to clipboard
child process crashed
Using latest version of LJA (git cloned, cd into LJA, compiled with cmake version 3.18.2, gcc 8.2.0) on a Linux-based cluster, it looks like it runs all the way until it attempts to uncompress homopolymers. The last few lines of the job log file:
00:06:31 161Mb INFO: Exporting remaining active transitions
00:06:31 161Mb INFO: Export to Dot
00:06:31 161Mb INFO: Export to GFA and compressed contigs
00:06:33 210Mb INFO: Finished repeat resolution
00:06:33 2Mb INFO: Performing polishing and homopolymer uncompression
00:06:33 7Mb INFO: Aligning reads back to assembly
00:06:38 1.1Gb INFO: Finished alignment.
00:06:38 1.1Gb INFO: Printing alignments to "lja_assembly/uncompressing/alignments.txt"
00:06:39 1.5Gb INFO: Reading and processing initial reads from ["hifi_data/hifi_reads.fasta"]
00:07:43 2.8Gb INFO: Uncompressing homopolymers in contigs
Child process crashed
And the last few lines of dbg.log
00:07:31 2.8Gb TRACE: Processed 230000 original reads
00:07:31 2.8Gb TRACE: Processed 231000 original reads
00:07:31 2.8Gb TRACE: Processed 232000 original reads
00:07:31 2.8Gb TRACE: Processed 233000 original reads
00:07:31 2.8Gb TRACE: Processed 234000 original reads
00:07:31 2.8Gb TRACE: Processed 235000 original reads
00:07:31 2.8Gb TRACE: Processed 236000 original reads
00:07:43 2.8Gb TRACE: Processed final batch of 22540 compressed reads
00:07:43 2.8Gb INFO: Uncompressing homopolymers in contigs
00:07:43 2.8Gb TRACE: Generating consensus for contig 104
core dump
Core was generated by `LJA/bin/lja -t 28 -o lja_assembly --reads hifi_data/hifi_reads_aligned_to_refer'.
Program terminated with signal 4, Illegal instruction.
#0 0x000000000057e4a6 in ?? ()
(gdb) bt
#0 0x000000000057e4a6 in ?? ()
#1 0x0000000000e90e10 in ?? ()
#2 0x00000000000000c9 in ?? ()
#3 0x000000000000001c in ?? ()
#4 0x0000000000555f79 in ?? ()
#5 0x00000000000000bf in ?? ()
#6 0x000000000002ef71 in ?? ()
#7 0x000000000255e930 in ?? ()
#8 0x00002aab18175390 in ?? ()
#9 0x00007fffffffa590 in ?? ()
#10 0x000000000255e880 in ?? ()
#11 0x00007fffffff8a80 in ?? ()
#12 0x00000000000000c9 in ?? ()
#13 0x00000000000869a5 in ?? ()
#14 0x00002aab1807d550 in ?? ()
#15 0x00002aab1814c9b0 in ?? ()
#16 0x00002aab18157aa0 in ?? ()
#17 0x00002aab18157aa0 in ?? ()
#18 0x0000000000000014 in ?? ()
#19 0x0000000000000000 in ?? ()
I'm using the following script (the genome is haploid:)
LJA/bin/lja -t 28 -o lja_assembly --reads hifi_data/hifi_reads.fasta
This doesn't seem to be a reads-specific error, as the same error in the same place occurred on two separate genomes. I'm not sure what other information I can give to help -- the log files aren't too helpful for troubleshooting.
I don't think it's a memory issue either because when I look at the job information after the fact, it says exit status 1. Usually OOM errors are exit status 137 (and I set memory to 300 Gb, so it shouldn't be an issue.)
Hi! Thank you for your interest in LJA and sorry for the crash. Unfortunately this is the first time I see such problem so I can not provide a quick fix. Would it be possible for you to share your data with me?
Unfortunately I cannot, and I don't think it would be any help because this is happening with HiFi reads of different genomes. I'm 100% confident the FASTA files that contain the CCS reads are not corrupted or anything -- it works running through hifiasm, hicanu, Flye, etc.
However, using gdb I was able to parse the core dump file a bit more and will help troubleshoot:
Program terminated with signal 4, Illegal instruction.
#0 0x000000000057e4a6 in _M_create_storage (__n=<optimized out>, this=<optimized out>)
at /LJA/src/lib/spoa/src/graph.cpp:65
65 Graph::Graph()
And the backtrace:
#0 0x000000000057e4a6 in _M_create_storage (__n=<optimized out>, this=<optimized out>)
at /LJA/src/lib/spoa/src/graph.cpp:65
#1 _Vector_base (__a=..., __n=<optimized out>, this=<optimized out>) at /cm/local/apps/gcc/8.2.0/include/c++/8.2.0/bits/stl_vector.h:260
#2 vector (__n=256, __a=..., __value=<optimized out>, this=0x2aaaae7afd78) at /cm/local/apps/gcc/8.2.0/include/c++/8.2.0/bits/stl_vector.h:429
#3 spoa::Graph::Graph() () at/LJA/src/lib/spoa/src/graph.cpp:73
#4 0x0000000000555f79 in MSAConsensus (this=0x255e880, logger=..., s=...)
at /LJA/src/projects/polishing/homopolish.cpp:173
#5 ContigInfo::GenerateConsensus[abi:cxx11](logging::Logger&) (this=<optimized out>, logger=...)
at /LJA/src/projects/polishing/homopolish.cpp:257
#6 0x00002aaaab5812de in ?? () from /cm/local/apps/gcc/8.2.0/lib64/libgomp.so.1
#7 0x00002aaaab9b8e25 in start_thread () from /lib64/libpthread.so.0
#8 0x00002aaaabccbbad in __opensock () from /lib64/libc.so.6
#9 0x0000000000000000 in ?? ()
And the detailed backtrace:
#0 0x000000000057e4a6 in _M_create_storage (__n=<optimized out>, this=<optimized out>)
at /LJA/src/lib/spoa/src/graph.cpp:65
No locals.
#1 _Vector_base (__a=..., __n=<optimized out>, this=<optimized out>) at /cm/local/apps/gcc/8.2.0/include/c++/8.2.0/bits/stl_vector.h:260
No locals.
#2 vector (__n=256, __a=..., __value=<optimized out>, this=0x2aaaae7afd78) at /cm/local/apps/gcc/8.2.0/include/c++/8.2.0/bits/stl_vector.h:429
No locals.
#3 spoa::Graph::Graph() () at /LJA/src/lib/spoa/src/graph.cpp:73
No locals.
#4 0x0000000000555f79 in MSAConsensus (this=0x255e880, logger=..., s=...)
at /LJA/src/projects/polishing/homopolish.cpp:173
alignment_engine = {_M_t = {
_M_t = {<std::_Tuple_impl<0, spoa::AlignmentEngine*, std::default_delete<spoa::AlignmentEngine> >> = {<std::_Tuple_impl<1, std::default_delete<spoa::AlignmentEngine> >> = {<std::_Head_base<1, std::default_delete<spoa::AlignmentEngine>, true>> = {<std::default_delete<spoa::AlignmentEngine>> = {<No data fields>}, <No data fields>}, <No data fields>}, <std::_Head_base<0, spoa::AlignmentEngine*, false>> = {
_M_head_impl = 0x2aab1807d550}, <No data fields>}, <No data fields>}}}
only_good = <optimized out>
pref_remove = <optimized out>
suf_remove = <optimized out>
med_len = <optimized out>
graph = {num_codes_ = 0, coder_ = {<std::_Vector_base<int, std::allocator<int> >> = {
_M_impl = {<std::allocator<int>> = {<__gnu_cxx::new_allocator<int>> = {<No data fields>}, <No data fields>},
_M_start = 0x2aab18159460, _M_finish = 0x0, _M_end_of_storage = 0x2aab18159860}}, <No data fields>},
decoder_ = {<std::_Vector_base<int, std::allocator<int> >> = {
_M_impl = {<std::allocator<int>> = {<__gnu_cxx::new_allocator<int>> = {<No data fields>}, <No data fields>},
_M_start = 0x2aab1815cd60, _M_finish = 0x1615, _M_end_of_storage = 0x1615}}, <No data fields>},
sequences_ = {<std::_Vector_base<spoa::Graph::Node*, std::allocator<spoa::Graph::Node*> >> = {
_M_impl = {<std::allocator<spoa::Graph::Node*>> = {<__gnu_cxx::new_allocator<spoa::Graph::Node*>> = {<No data fields>}, <No data fields>}, _M_start = 0x100000, _M_finish = 0x2aab18162290, _M_end_of_storage = 0x1eca}}, <No data fields>},
nodes_ = {<std::_Vector_base<std::unique_ptr<spoa::Graph::Node, std::default_delete<spoa::Graph::Node> >, std::allocator<std::unique_ptr<spoa::Graph::Node, std::default_delete<spoa::Graph::Node> > > >> = {
_M_impl = {<std::allocator<std::unique_ptr<spoa::Graph::Node, std::default_delete<spoa::Graph::Node> > >> = {<__gnu_cxx::new_allocator<std::unique_ptr<spoa::Graph::Node, std::default_delete<spoa::Graph::Node> > >> = {<No data fields>}, <No data fields>}, _M_start = 0x1eca,
_M_finish = 0x54414741434154, _M_end_of_storage = 0x2aaaae7afde0}}, <No data fields>},
edges_ = {<std::_Vector_base<std::unique_ptr<spoa::Graph::Edge, std::default_delete<spoa::Graph::Edge> >, std::allocator<std::unique_ptr<spoa::Graph::Edge, std::default_delete<spoa::Graph::Edge> > > >> = {
_M_impl = {<std::allocator<std::unique_ptr<spoa::Graph::Edge, std::default_delete<spoa::Graph::Edge> > >> = {<__gnu_cxx::new_allocator<std::unique_ptr<spoa::Graph::Edge, std::default_delete<spoa::Graph::Edge> > >> = {<No data fields>}, <No data fields>}, _M_start = 0x0,
_M_finish = 0x0, _M_end_of_storage = 0x54415441435441}}, <No data fields>},
rank_to_node_ = {<std::_Vector_base<spoa::Graph::Node*, std::allocator<spoa::Graph::Node*> >> = {
_M_impl = {<std::allocator<spoa::Graph::Node*>> = {<__gnu_cxx::new_allocator<spoa::Graph::Node*>> = {<No data fields>}, <No data fields>}, _M_start = 0x7fffffff8f80, _M_finish = 0x3ee4, _M_end_of_storage = 0x7fffffff90c0}}, <No data fields>},
consensus_ = {<std::_Vector_base<spoa::Graph::Node*, std::allocator<spoa::Graph::Node*> >> = {
_M_impl = {<std::allocator<spoa::Graph::Node*>> = {<__gnu_cxx::new_allocator<spoa::Graph::Node*>> = {<No data fields>}, <No data fields>}, _M_start = 0x3ee4, _M_finish = 0x7fffffff9720, _M_end_of_storage = 0x198ca0}}, <No data fields>}}
cov = <optimized out>
all_len = <optimized out>
good_cons = <optimized out>
coverages = {<std::_Vector_base<unsigned int, std::allocator<unsigned int> >> = {
_M_impl = {<std::allocator<unsigned int>> = {<__gnu_cxx::new_allocator<unsigned int>> = {<No data fields>}, <No data fields>},
---Type <return> to continue, or q <return> to quit---
_M_start = 0x2aab1814c9b0, _M_finish = 0x2aab18157aa0, _M_end_of_storage = 0x2aab18157aa0}}, <No data fields>}
consensus = {static npos = 18446744073709551615,
_M_dataplus = {<std::allocator<char>> = {<__gnu_cxx::new_allocator<char>> = {<No data fields>}, <No data fields>},
_M_p = 0x2aab1814aba0 "\350\a"}, _M_string_length = 5662, {
_M_local_buf = "\000\036\000\000\000\000\000\000\200\217\377\377\377\177\000", _M_allocated_capacity = 7680}}
#5 ContigInfo::GenerateConsensus[abi:cxx11](logging::Logger&) (this=<optimized out>, logger=...)
at /LJA/src/projects/polishing/homopolish.cpp:257
start_pos = 551333
consensus = {static npos = 18446744073709551615,
_M_dataplus = {<std::allocator<char>> = {<__gnu_cxx::new_allocator<char>> = {<No data fields>}, <No data fields>}, _M_p = 0x0},
_M_string_length = 7882, {_M_local_buf = "\001\000\000\000 \000\000\000\240\260\001\030\253*\000",
_M_allocated_capacity = 137438953473}}
i = 191
ss = <incomplete type>
debug = <incomplete type>
total_count = <optimized out>
cur_complex_ind = <optimized out>
consensus = {static npos = 18446744073709551615,
_M_dataplus = {<std::allocator<char>> = {<__gnu_cxx::new_allocator<char>> = {<No data fields>}, <No data fields>},
_M_p = 0x2aaaae7afbe0 "\017\374z\256\252*"}, _M_string_length = 38, {_M_local_buf = "`\315\025\030\253*\000\000L(ū\252*\000",
_M_allocated_capacity = 46914331856224}}
#6 0x00002aaaab5812de in ?? () from /cm/local/apps/gcc/8.2.0/lib64/libgomp.so.1
No symbol table info available.
#7 0x00002aaaab9b8e25 in start_thread () from /lib64/libpthread.so.0
No symbol table info available.
#8 0x00002aaaabccbbad in __opensock () from /lib64/libc.so.6
No symbol table info available.
#9 0x0000000000000000 in ?? ()
No symbol table info available.
Hi,
I have the same error or kind of error, after 4 hours ... 03:06:01 5Gb INFO: Export to GFA and compressed contigs 03:06:58 5.7Gb INFO: Finished repeat resolution 03:07:07 2Mb INFO: Performing polishing and homopolymer uncompression 03:07:42 0.5Gb INFO: Aligning reads back to assembly 03:17:28 20.7Gb INFO: Finished alignment. 03:17:28 20.7Gb INFO: Printing alignments to "lja_ec/uncompressing/alignments.txt" 03:18:04 35.3Gb INFO: Reading and processing initial reads from ["reads.fa"] 04:06:51 36.9Gb INFO: Uncompressing homopolymers in contigs Child process crashed
Hi chklopp! Sorry to hear that. Unfortunately we still can not reproduce this problem on our data. Maybe you could share your data with me? it would be really helpful.
No sorry, it is not my data and it is was just to test your code. We already have a good assembly for it with hifiasm.
I also have seen this error a few times, but it seems pretty nondeterministic. For example this run crashed
04:32:08 73.9Gb INFO: Applying changes to the graph
05:15:48 105Gb INFO: Collecting and storing read suffixes
05:36:48 108.3Gb INFO: Correcting dinucleotide errors in reads
Child process crashed
while this one made it past (currently still running)
02:54:37 70.8Gb INFO: Applying changes to the graph
03:34:20 102.6Gb INFO: Collecting and storing read suffixes
03:51:56 108.8Gb INFO: Correcting dinucleotide errors in reads
05:50:43 108.8Gb INFO: Applying corrections to reads
06:00:06 109.2Gb INFO: Applied correction to 302606 reads
06:00:07 109.2Gb INFO: Corrected 302606 dinucleotide sequences
06:00:07 109.2Gb INFO: Marking reliable edges
06:00:29 109.2Gb INFO: Marked 1017912 edges in 248165 paths as reliable
06:00:30 109.2Gb INFO: Correcting low covered regions in reads with K = 800
08:31:43 110.4Gb INFO: Applying corrections to reads
08:59:50 111.5Gb INFO: Applied correction to 982484 reads
08:59:50 111.5Gb INFO: Corrected low covered regions in 982484 reads with K = 800
08:59:50 111.5Gb INFO: Applying changes to the graph
09:43:54 137.9Gb INFO: Marking reliable edges
09:44:02 137.9Gb INFO: Marked 116356 edges in 37311 paths as reliable
09:44:02 137.9Gb INFO: Correcting low covered regions in reads with K = 2000
11:43:52 137.9Gb INFO: Applying corrections to reads
11:57:48 137.9Gb INFO: Applied correction to 101111 reads
11:57:49 137.9Gb INFO: Corrected low covered regions in 101111 reads with K = 2000
11:57:49 137.9Gb INFO: Applying changes to the graph
12:53:23 156.8Gb INFO: Correcting dinucleotide errors in reads
I've seen this a few times where literally re-running the same command will sometimes crash with the Child process crashed
error and at different times. This is on a HPC, so could be different nodes with different CPUs etc. I haven't ever made it past the first error correction (either due to crash or a 24 hour wall limit), so am hoping the current 120h job will make it further. It is trio binned data for this sample, so I can share the fastq if wanted.
Hi @ASLeonard . Your issue is different from what others have in this thread so I created a new issue and I will work on it separately.
Hi @AntonBankevich. I also have the "child process crashed " error when I run LJA with 100X HiFi reads of human. I report as following:
My command is:
export PATH={dir_lja}:$PATH
lja -t 48 --diploid -o 02_LJA/child.HiFi.LJA_default-2.0 --reads /data/DATA/RAWDATA/HiFi/fastqs/m64154_210327_091530.ccs.fq.gz --reads /data/DATA/RAWDATA/HiFi/fastqs/m64154_210328_153009.ccs.fq.gz --reads /data/DATA/RAWDATA/HiFi/fastqs/m64154_210403_042108.ccs.fq.gz --reads /data/DATA/RAWDATA/HiFi/fastqs/m64236_210323_112137.ccs.fq.gz --reads /data/DATA/RAWDATA/HiFi/fastqs/m64236_210330_051441.ccs.fq.gz --reads /data/DATA/RAWDATA/HiFi/fastqs/m64154_210318_093818.ccs.fq.gz --reads /data/DATA/RAWDATA/HiFi/fastqs/m64154_210322_170722.ccs.fq.gz --reads /data/DATA/RAWDATA/HiFi/fastqs/m64154_210323_233537.ccs.fq.gz --reads /data/DATA/RAWDATA/HiFi/fastqs/m64154_210325_060339.ccs.fq.gz --reads /data/DATA/RAWDATA/HiFi/fastqs/m64154_210401_215258.ccs.fq.gz
part of log:
46:09:26 56.3Gb INFO: Collected 863 overlaps. Looking for unique overlaps
46:09:27 56.3Gb INFO: Collected 44 unique overlaps.
46:09:27 56.3Gb INFO: Adding new connections to the graph
46:15:01 56.3Gb INFO: Correcting tips using reliable edge marks
46:16:47 56.3Gb INFO: Applying corrections to reads
46:16:48 56.3Gb INFO: Applied correction to 114731 reads
46:16:57 56.3Gb INFO: Applying changes to the graph
46:25:31 56.3Gb INFO: Could not correct 181425 reads. They will be removed.
46:25:32 56.3Gb INFO: Uncorrected reads were removed.
46:25:32 56.3Gb INFO: Applying changes to the graph
Child process crashed
Please see the details from following two log files. std.log dbg.log.zip
Best, Peng