NextDenovo
NextDenovo copied to clipboard
Segmentation fault (core dumped) at 03.ctg_graph/01.ctg_graph.sh.work/ctg_graph0
Describe the bug
Have been running nextdenovo (v2.4.0) for the last couple of months for a de novo genome assembly (total genome size of 2.5 GB). The program took two months to finish the jobs within 02.cns_align.sh.work but everything ran ok. However, once it moved to 03.ctg_graph it stopped with a segmentation fault at the /01.ctg_graph.sh.work/ctg_graph0.
Error message
#$ tail -f 10 pid7227.log.info
"[INFO] 2023-03-25 08:09:41,174 Submit jobID:[61788] jobCmd:[/share/data04/andresantos/Pseudunio_auricularius/HN00182649_hdd1/RawData/nexdenovo_assembly/02.cns_align/02.cns_align.sh.work/cns_align23/nextDenovo.sh] in the local_cycle. [INFO] 2023-03-26 22:35:08,091 Submit jobID:[48347] jobCmd:[/share/data04/andresantos/Pseudunio_auricularius/HN00182649_hdd1/RawData/nexdenovo_assembly/02.cns_align/02.cns_align.sh.work/cns_align24/nextDenovo.sh] in the local_cycle. [INFO] 2023-03-28 17:27:35,526 Submit jobID:[52069] jobCmd:[/share/data04/andresantos/Pseudunio_auricularius/HN00182649_hdd1/RawData/nexdenovo_assembly/02.cns_align/02.cns_align.sh.work/cns_align25/nextDenovo.sh] in the local_cycle. [INFO] 2023-03-29 15:05:12,344 Submit jobID:[54523] jobCmd:[/share/data04/andresantos/Pseudunio_auricularius/HN00182649_hdd1/RawData/nexdenovo_assembly/02.cns_align/02.cns_align.sh.work/cns_align26/nextDenovo.sh] in the local_cycle. [INFO] 2023-04-06 12:48:15,650 Submit jobID:[60690] jobCmd:[/share/data04/andresantos/Pseudunio_auricularius/HN00182649_hdd1/RawData/nexdenovo_assembly/02.cns_align/02.cns_align.sh.work/cns_align27/nextDenovo.sh] in the local_cycle. [INFO] 2023-04-15 11:51:34,786 cns_align done [INFO] 2023-04-15 11:51:39,917 Total jobs: 1 [INFO] 2023-04-15 11:51:39,929 Submit jobID:[15206] jobCmd:[/share/data04/andresantos/Pseudunio_auricularius/HN00182649_hdd1/RawData/nexdenovo_assembly/03.ctg_graph/01.ctg_graph.sh.work/ctg_graph0/nextDenovo.sh] in the local_cycle. [ERROR] 2023-04-15 11:51:59,560 ctg_graph failed: please check the following logs: [ERROR] 2023-04-15 11:51:59,561 /share/data04/andresantos/Pseudunio_auricularius/HN00182649_hdd1/RawData/nexdenovo_assembly/03.ctg_graph/01.ctg_graph.sh.work/ctg_graph0/nextDenovo.sh.e"
#$ cat /share/data04/andresantos/Pseudunio_auricularius/HN00182649_hdd1/RawData/nexdenovo_assembly/03.ctg_graph/01.ctg_graph.sh.work/ctg_graph0/nextDenovo.sh.e
"hostname
- hostname cd /share/data04/andresantos/Pseudunio_auricularius/HN00182649_hdd1/RawData/nexdenovo_assembly/03.ctg_graph/01.ctg_graph.sh.work/ctg_graph0
- cd /share/data04/andresantos/Pseudunio_auricularius/HN00182649_hdd1/RawData/nexdenovo_assembly/03.ctg_graph/01.ctg_graph.sh.work/ctg_graph0 /NextDenovo/bin/nextgraph -a 1 -f /share/data04/andresantos/Pseudunio_auricularius/HN00182649_hdd1/RawData/nexdenovo_assembly/03.ctg_graph/01.ctg_graph.input.seqs /share/data04/andresantos/Pseudunio_auricularius/HN00182649_hdd1/RawData/nexdenovo_assembly/03.ctg_graph/01.ctg_graph.input.ovls -o nd.asm.p.fasta;
- /NextDenovo/bin/nextgraph -a 1 -f /share/data04/andresantos/Pseudunio_auricularius/HN00182649_hdd1/RawData/nexdenovo_assembly/03.ctg_graph/01.ctg_graph.input.seqs /share/data04/andresantos/Pseudunio_auricularius/HN00182649_hdd1/RawData/nexdenovo_assembly/03.ctg_graph/01.ctg_graph.input.ovls -o nd.asm.p.fasta [INFO] 2023-04-15 11:51:40 Initialize graph and reading... Segmentation fault (core dumped)"
Genome characteristics
`genome size ~2.5Gb,
heterozygous rate - I have not estimated this since I only have PacBio reads. However, by experience in the organism in question I would say it has a moderate to low heterozigozity
repeat content - based on a closely related species it should be ~50% of the genome assembly
Input data `Types Count (#) Length (bp) N10 320113 32018 N20 795312 25252 N30 1371150 21441 N40 2038774 18706 N50 2799212 16482 N60 3664213 14381 N70 4677464 11970 N80 5939270 9241 N90 7694302 6074
Types Count (#) Bases (bp) Depth (X) Raw 13506201 134176569075 53.67 Filtered 1746043 777160708 0.31 Clean 11760158 133399408367 53.36 `
Config file
“[General] job_type = local # local, slurm, sge, pbs, lsf job_prefix = nextDenovo task = all # all, correct, assemble rewrite = no # yes/no deltmp = yes parallel_jobs = 7 # number of tasks used to run in parallel - M/64 here, 64 can optimize to 32~64 input_type = raw # raw, corrected read_type = clr # clr, ont, hifi input_fofn = input.fofn workdir = nexdenovo_assembly
[correct_option] read_cutoff = 1k genome_size = 2.5g # estimated genome size sort_options = -m 40g -t 5 # -m TOTAL_INPUT_BASES * 1.2/4g -t P/pa_correction minimap2_options_raw = -t 4 # -t P/parallel_jobs pa_correction = 5 # M/(TOTAL_INPUT_BASES * 1.2/4) correction_options = -p 3 # -p P/pa_correction
[assemble_option] minimap2_options_cns = -t 4 # -t P/parallel_jobs nextgraph_options = -a 1"
Operating system LSB Version: :core-4.1-amd64:core-4.1-noarch:cxx-4.1-amd64:cxx-4.1-noarch:desktop-4.1-amd64:desktop-4.1-noarch:languages-4.1-amd64:languages-4.1-noarch:printing-4.1-amd64:printing-4.1-noarch Distributor ID: CentOS Description: CentOS Linux release 7.4.1708 (Core) Release: 7.4.1708 Codename: Core
GCC Using built-in specs. COLLECT_GCC=gcc COLLECT_LTO_WRAPPER=/usr/local/libexec/gcc/x86_64-unknown-linux-gnu/5.4.0/lto-wrapper Target: x86_64-unknown-linux-gnu Configured with: ../gcc-5.4.0/configure --enable-languages=c,c++ --disable-multilib Thread model: posix gcc version 5.4.0 (GCC)
Python Python 3.8.5
NextDenovo nextDenovo v2.4.0
To Reproduce (Optional) In my past attempts I was able to run the software successfully, with both PacBio CLR and PacBio Hi-Fi reads. So I don't think i can recreat the problem with a smaller dataset. Sorry :/
Additional context (Optional)
Note that it did not run out of memory, so I am a little bit puzzled about what the error is about.
In the past, I have successfully run Nextdenovo (same version) for a similar genome without this problem. The only difference I can notice between the two projects is that in the past I provide the reads in fasta format (one file), while now I used fastq (two files). In both cases PacBio CLR.
I have found a similar issue on github (https://github.com/Nextomics/NextDenovo/issues/86) but unfortunately, there was no solutions available.
Would you kindly let me know if you managed to solve this issue?
Also, I have never been able to restart a stopped job. How can I do this? In the FAQ’s is said to “simply run the same command” but when I tried it, it created a backup of all the previous folders and start the assembly all over again from the beginning.
I hope you can help me, Cheers, André
see #113
rewrite = no # yes/no
means NextDenovo can not overwrite the existed work directory, so it has to create a backup of all the previous folders and start the assembly all over again from the beginning.