NOVOPlasty excess memory usage using merged read files

I have two runs of the same Illumina library and merged them using 'cat' prior to NOVOplasty. The first run's set of reads did not assemble a mitogenome, so we sequenced a second run of same library which also did not assemble a mitogenome. I was hoping that by merging the two runs, there will be sufficient coverage. However, when I run NOVOplasty using the merged file, the memory usage steadily increases over 1-2 minutes during the 'retrieve seed' process until it reaches 100% and then the process is killed. The log file is empty. Attached are the first 40 lines of the read files, and the config file, so you can see what the read files look like.

I have 7.7 GiB of memory and have successfully run NOVOplasty on read files a thousand times larger. The individual read files are 40-45 MB, and the merged files are ~85 MB.

Here's the command for how I merged the files: cat TK24928-2-first.fastq TK24928-2-second.fastq > mergedTK24928-2-fastq

Thanks! Russell

TK24928-1-first-40.fastq.txt TK24928-1-second-40.fastq.txt TK24928-2-first-40.fastq.txt TK24928-2-second-40.fastq.txt config.txt

Jun 09 '20 21:06 rspfau

Hi,

That must be a bug, retrieve seed shouldn't take any memory

You had this problem with the latest version?

Jun 09 '20 21:06 ndierckx

NOVOPlasty3.8.3.pl

Jun 09 '20 21:06 rspfau

Could you try the latest version just to see if the problem is still there?

Jun 09 '20 21:06 ndierckx

Yes, same result: [email protected]@TSU98054-LX:~/Desktop/second attempt geomys mitogenomes/TK24928/merged 2nd attempt$ perl ../../NOVOPlasty4.0.pl -c config.txt

NOVOPlasty: The Organelle Assembler Version 4.0 Author: Nicolas Dierckxsens, (c) 2015-2020

Input parameters from the configuration file: *** Verify if everything is correct ***

Project:

Project name = TK24928merged Type = mito Genome range = 15000-17000 K-mer = 30 Max memory = 3 Extended log = 1 Save assembled reads = yes Seed Input = ../../Geomys-pinetis-cytb-seed Extend seed directly = no Reference sequence = Variance detection = Chloroplast sequence =

Dataset 1:

Read Length = 301 Insert size = 412 Platform = illumina Single/Paired = PE Combined reads = Forward reads = mergedTK24928-1.fastq Reverse reads = mergedTK24928-2.fastq

Heteroplasmy:

Heteroplasmy = HP exclude list = PCR-free =

Optional:

Insert size auto = yes Use Quality Scores =

Reading Input......OK

Building Hash Table......OK

Subsampled fraction: 99.84 % Forward reads without pair: 466 Reverse reads without pair: 342

Retrieve Seed...

Jun 09 '20 21:06 rspfau

Here are the complete merged read files https://drive.google.com/file/d/1awqUKHKj_z24W5To_jUE38MmhhuQFNLs/view?usp=sharing

Jun 09 '20 22:06 rspfau

I need access to them, I did send a request

Jun 09 '20 22:06 ndierckx

Hi, Could you also send the seed you used?

Jun 10 '20 11:06 ndierckx

Yes, here it is

Geomys-pinetis-cytb-seed.txt

Jun 10 '20 14:06 rspfau

I tried a different seed, which I obtained by doing a local blast of the Illumina reads, and it worked without the memory issue--but still not enough depth to assemble mitogenome :(

New seed attached here: TK24928Pfau_2373363-trimmedends.txt

Jun 10 '20 14:06 rspfau

And I've merged several other reads and haven't had any problems. It was somehow the combination of that particular merged read file and that particular seed file.

Jun 10 '20 16:06 rspfau

Hi,

I still had to find the bug, but that is solved now, will upload the new version now.

I tried the assembly, the coverage is indeed to low, but I saw the the reads are heavily trimmed, I would advice to not do that, you loose a lot of data like that...

And best to lower to kmer to 21 or so, better when coverage is low, but more importantly to not trim.

And best not to get 300 bp illumina reads, they have very low quality, best to stick to 250!

Jun 10 '20 20:06 ndierckx

NOVOPlasty NOVOPlasty copied to clipboard

excess memory usage using merged read files

NOVOPlasty: The Organelle Assembler Version 4.0 Author: Nicolas Dierckxsens, (c) 2015-2020

Project:

Dataset 1:

Heteroplasmy:

Optional:

NOVOPlasty
NOVOPlasty copied to clipboard