long read files input format problem

Open 87joe opened this issue 5 years ago • 1 comments

I used CONSENT v2.0 to do nanopore long reads correction, but I encountered some problem all errors are core dump. I traced the source code and found the function indexReads in utils.cpp is the problem. This function read readsfile in the pattern "1 header 1 seq", e.g, >seq001 TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTAGGAAATAAAGTAAATTTTTGTTGTTGTACTTCGTTCAGTTTGGGTGTTTAACCAGATGTCGCCTACCGTGACAAGAAAGTTGAAAGAAAATAAGAAAATACGGCGCTGTCGCGGTTCGAACCACAGACCTTGACCCCCAGCAATATCAGCACCAACGAAACACAAGACACCGACAACTTTCTTGTC So I modified my fasta format like the pattern, CONSENT worked. My original fasta file format(60 char per line every sequence): >seq001 TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTAGGAAATAAAGTAAATTTTTGTT GTTGTACTTCGTTCAGTTTGGGTGTTTAACCAGATGTCGCCTACCGTGACAAGAAAGTTG AAAGAAAATAAGAAAATACGGCGCTGTCGCGGTTCGAACCACAGACCTTGACCCCCAGCA ATATCAGCACCAACGAAACACAAGACACCGACAACTTTCTTGTC But when I correcting another nanopore long reads file it was a new problem minimap2 core dump. I tried to run minimap2 with original format was working, but the CONSENT will core dump first file is 1G, modified format all good second file is 13G, modified format minimap2 core dump

May 27 '20 02:05 87joe

Hi,

This a weird error, I never encountered any issue with Minimap2 and 1 header 1 seq reads file. Did you try to launch only Minimap2 on your 13G long reads file and see what happened?

Also, does your file contain extremely long or extremely short reads? That'd be weird, but maybe it could cause the issue.

Jun 22 '20 15:06 morispi