NOVOPlasty icon indicating copy to clipboard operation
NOVOPlasty copied to clipboard

Running time

Open Silverfoxcome opened this issue 5 years ago • 7 comments

Hi! I was wondering how much time should take for my data to finish running? I gave NOVOPlasty my raw dataset of two compressed fastq files (.fastq.gz) of 35,0 GB each. Uncompressed they are almost 130 GB each. My pc has a RAM of 147 and 12 processors.

How much time should I expect my process to last? (It has been running for 48 hours already)

Thank you in advance :)

Silverfoxcome avatar Jun 10 '19 13:06 Silverfoxcome

That are huge files, so you probably won't have enough memory, you should cancel the run

Just use the max memory option, you don't need that much data at all. Put 10 or 15 as max memory, that should be more than enough. If you want to check for heteroplasmy, let me know, then I will send you an extra script

ndierckx avatar Jun 10 '19 16:06 ndierckx

That are huge files, so you probably won't have enough memory, you should cancel the run

Just use the max memory option, you don't need that much data at all. Put 10 or 15 as max memory, that should be more than enough. If you want to check for heteroplasmy, let me know, then I will send you an extra script

What does the program when I set the max memory parameter to 10 or 15? Will it use less data from my datasets?

Yes please, I'll love to check for heteroplasmy as well!

Thanks for all your help :)

Editing: I just read what the max memory parameter does:

Max memory You can choose a max memory usage, suitable to automatically subsample the data or when you have limited memory capacity. If you have sufficient memory, leave it blank, else write your available memory in GB (if you have for example a 8 GB RAM laptop, put down 7 or 7.5 (don't add the unit in the config file))

So, this will sub sample my data! that's great :D

My config file ended up like this:

Project:

Project name = test_chloro Type = chloro Genome Range = 120000-180000 K-mer = 39 Max memory = 15 Extended log = yes Save assembled reads = no Seed Input = /home/maiz/Documentos/2018-RS/maiz_morado/ZM1/Seed_RUBP.fasta Reference sequence = /home/maiz/Documentos/2018-RS/maiz_morado/ZM1/cp_ref.fasta Variance detection = no Chloroplast sequence =

Dataset 1:

Read Length = 151 Insert size = 450 Platform = illumina Single/Paired = PE Combined reads = Forward reads = /home/maiz/Documentos/2018-RS/maiz_morado/ZM1/ZM1_R1.fastq.gz Reverse reads = /home/maiz/Documentos/2018-RS/maiz_morado/ZM1/ZM1_R2.fastq.gz

Optional:

Insert size auto = yes Insert Range = 1.9 Insert Range strict = 1.3 Use Quality Scores =

Thank you :)

Silverfoxcome avatar Jun 10 '19 16:06 Silverfoxcome

seems good will answer tomorrow about heteroplasmy

ndierckx avatar Jun 10 '19 17:06 ndierckx

Still interested in heteroplasmy detection?

ndierckx avatar Jul 09 '19 18:07 ndierckx

Hi! Yes please! I'm still very interested in looking for heteroplasmy! Thank you so much for all your help :D

Silverfoxcome avatar Jul 09 '19 19:07 Silverfoxcome

As you can't get a good complete assembly, I would test heteroplasmy on a smaller region. So make a fasta file with maybe 30000 bp sequence that doesn't contain the inverted repeats Use that file as a seed and reference for the heteroplasmy detection

ndierckx avatar Jul 10 '19 02:07 ndierckx

As you can't get a good complete assembly, I would test heteroplasmy on a smaller region. So make a fasta file with maybe 30000 bp sequence that doesn't contain the inverted repeats Use that file as a seed and reference for the heteroplasmy detection

I'll do that as soon as I can :D

I'm now assembling the mitochondria of purple maize :D

I was wondering how much max memory should I give NOVOPlasty to automatically subsample the data?

Thanks for all your help!!!

Silverfoxcome avatar Aug 01 '19 21:08 Silverfoxcome