NOVOPlasty icon indicating copy to clipboard operation
NOVOPlasty copied to clipboard

excess memory usage using merged read files

Open rspfau opened this issue 4 years ago • 11 comments

I have two runs of the same Illumina library and merged them using 'cat' prior to NOVOplasty. The first run's set of reads did not assemble a mitogenome, so we sequenced a second run of same library which also did not assemble a mitogenome. I was hoping that by merging the two runs, there will be sufficient coverage. However, when I run NOVOplasty using the merged file, the memory usage steadily increases over 1-2 minutes during the 'retrieve seed' process until it reaches 100% and then the process is killed. The log file is empty. Attached are the first 40 lines of the read files, and the config file, so you can see what the read files look like.

I have 7.7 GiB of memory and have successfully run NOVOplasty on read files a thousand times larger. The individual read files are 40-45 MB, and the merged files are ~85 MB.

Here's the command for how I merged the files: cat TK24928-2-first.fastq TK24928-2-second.fastq > mergedTK24928-2-fastq

Thanks! Russell

TK24928-1-first-40.fastq.txt TK24928-1-second-40.fastq.txt TK24928-2-first-40.fastq.txt TK24928-2-second-40.fastq.txt config.txt

rspfau avatar Jun 09 '20 21:06 rspfau

Hi,

That must be a bug, retrieve seed shouldn't take any memory

You had this problem with the latest version?

ndierckx avatar Jun 09 '20 21:06 ndierckx

NOVOPlasty3.8.3.pl

rspfau avatar Jun 09 '20 21:06 rspfau

Could you try the latest version just to see if the problem is still there?

ndierckx avatar Jun 09 '20 21:06 ndierckx

Yes, same result: [email protected]@TSU98054-LX:~/Desktop/second attempt geomys mitogenomes/TK24928/merged 2nd attempt$ perl ../../NOVOPlasty4.0.pl -c config.txt


NOVOPlasty: The Organelle Assembler Version 4.0 Author: Nicolas Dierckxsens, (c) 2015-2020

Input parameters from the configuration file: *** Verify if everything is correct ***

Project:

Project name = TK24928merged Type = mito Genome range = 15000-17000 K-mer = 30 Max memory = 3 Extended log = 1 Save assembled reads = yes Seed Input = ../../Geomys-pinetis-cytb-seed Extend seed directly = no Reference sequence = Variance detection = Chloroplast sequence =

Dataset 1:

Read Length = 301 Insert size = 412 Platform = illumina Single/Paired = PE Combined reads = Forward reads = mergedTK24928-1.fastq Reverse reads = mergedTK24928-2.fastq

Heteroplasmy:

Heteroplasmy = HP exclude list = PCR-free =

Optional:

Insert size auto = yes Use Quality Scores =

Reading Input......OK

Building Hash Table......OK

Subsampled fraction: 99.84 % Forward reads without pair: 466 Reverse reads without pair: 342

Retrieve Seed...

rspfau avatar Jun 09 '20 21:06 rspfau

Here are the complete merged read files https://drive.google.com/file/d/1awqUKHKj_z24W5To_jUE38MmhhuQFNLs/view?usp=sharing

rspfau avatar Jun 09 '20 22:06 rspfau

I need access to them, I did send a request

ndierckx avatar Jun 09 '20 22:06 ndierckx

Hi, Could you also send the seed you used?

ndierckx avatar Jun 10 '20 11:06 ndierckx

Yes, here it is

Geomys-pinetis-cytb-seed.txt

rspfau avatar Jun 10 '20 14:06 rspfau

I tried a different seed, which I obtained by doing a local blast of the Illumina reads, and it worked without the memory issue--but still not enough depth to assemble mitogenome :(

New seed attached here: TK24928Pfau_2373363-trimmedends.txt

rspfau avatar Jun 10 '20 14:06 rspfau

And I've merged several other reads and haven't had any problems. It was somehow the combination of that particular merged read file and that particular seed file.

rspfau avatar Jun 10 '20 16:06 rspfau

Hi,

I still had to find the bug, but that is solved now, will upload the new version now.

I tried the assembly, the coverage is indeed to low, but I saw the the reads are heavily trimmed, I would advice to not do that, you loose a lot of data like that...

And best to lower to kmer to 21 or so, better when coverage is low, but more importantly to not trim.

And best not to get 300 bp illumina reads, they have very low quality, best to stick to 250!

ndierckx avatar Jun 10 '20 20:06 ndierckx