NOVOPlasty
NOVOPlasty copied to clipboard
excess memory usage using merged read files
I have two runs of the same Illumina library and merged them using 'cat' prior to NOVOplasty. The first run's set of reads did not assemble a mitogenome, so we sequenced a second run of same library which also did not assemble a mitogenome. I was hoping that by merging the two runs, there will be sufficient coverage. However, when I run NOVOplasty using the merged file, the memory usage steadily increases over 1-2 minutes during the 'retrieve seed' process until it reaches 100% and then the process is killed. The log file is empty. Attached are the first 40 lines of the read files, and the config file, so you can see what the read files look like.
I have 7.7 GiB of memory and have successfully run NOVOplasty on read files a thousand times larger. The individual read files are 40-45 MB, and the merged files are ~85 MB.
Here's the command for how I merged the files: cat TK24928-2-first.fastq TK24928-2-second.fastq > mergedTK24928-2-fastq
Thanks! Russell
TK24928-1-first-40.fastq.txt TK24928-1-second-40.fastq.txt TK24928-2-first-40.fastq.txt TK24928-2-second-40.fastq.txt config.txt
Hi,
That must be a bug, retrieve seed shouldn't take any memory
You had this problem with the latest version?
NOVOPlasty3.8.3.pl
Could you try the latest version just to see if the problem is still there?
Yes, same result: [email protected]@TSU98054-LX:~/Desktop/second attempt geomys mitogenomes/TK24928/merged 2nd attempt$ perl ../../NOVOPlasty4.0.pl -c config.txt
NOVOPlasty: The Organelle Assembler Version 4.0 Author: Nicolas Dierckxsens, (c) 2015-2020
Input parameters from the configuration file: *** Verify if everything is correct ***
Project:
Project name = TK24928merged Type = mito Genome range = 15000-17000 K-mer = 30 Max memory = 3 Extended log = 1 Save assembled reads = yes Seed Input = ../../Geomys-pinetis-cytb-seed Extend seed directly = no Reference sequence = Variance detection = Chloroplast sequence =
Dataset 1:
Read Length = 301 Insert size = 412 Platform = illumina Single/Paired = PE Combined reads = Forward reads = mergedTK24928-1.fastq Reverse reads = mergedTK24928-2.fastq
Heteroplasmy:
Heteroplasmy = HP exclude list = PCR-free =
Optional:
Insert size auto = yes Use Quality Scores =
Reading Input......OK
Building Hash Table......OK
Subsampled fraction: 99.84 % Forward reads without pair: 466 Reverse reads without pair: 342
Retrieve Seed...
Here are the complete merged read files https://drive.google.com/file/d/1awqUKHKj_z24W5To_jUE38MmhhuQFNLs/view?usp=sharing
I need access to them, I did send a request
Hi, Could you also send the seed you used?
I tried a different seed, which I obtained by doing a local blast of the Illumina reads, and it worked without the memory issue--but still not enough depth to assemble mitogenome :(
New seed attached here: TK24928Pfau_2373363-trimmedends.txt
And I've merged several other reads and haven't had any problems. It was somehow the combination of that particular merged read file and that particular seed file.
Hi,
I still had to find the bug, but that is solved now, will upload the new version now.
I tried the assembly, the coverage is indeed to low, but I saw the the reads are heavily trimmed, I would advice to not do that, you loose a lot of data like that...
And best to lower to kmer to 21 or so, better when coverage is low, but more importantly to not trim.
And best not to get 300 bp illumina reads, they have very low quality, best to stick to 250!