MITObim
MITObim copied to clipboard
multiple seeds
Dear Chris, I'm wondering whether It's possible to provide MITObim with multiple seeds (say cox1 and nad6). With former v. 1.8, I tried to put both genes in same reference file, but it didn't work. Cheers, SG
Hi,
Sure, it's possible and it should work too. If you provide two seeds in the same fasta file, like
>seed1
ACGT
>seed2
GCGTA
then it will just extend both until no more reads can be added.
What it will not do, however, is putting the results from the two seeds together automatically.
Once it's done with the iterations, say it finished after 42, I suggest you do a final iteration for which you use the denovo mode. So, -start 43 -end 43 -sample 'whatever you used before' -readpool 'readpool from iteration 42' --quick 'result from iteration 42' --denovo ..
then it'll just assemble all reads but ignoring the reference essentially and should put it all together correctly.
For this final denovo assembly of the mt reads you could of course use any other assembler outside of MITObim/MIRA.
Hope that helps!
cheers, Christoph
Thanks Christoph, I just realized the stupid reason for which my command wasn't successful. Anyway, I have a couple of questions. Why if I use nad6 as seed the program reconstructs correctly (32 iterations, ~25k length) the mt genome of my bug, while It doesn't happen when I use cox1? with cox1 mitobim reaches the stationary state after 29 iterations (~16k length).
These are final summary logs for each test: cox1 readpool contains 64383 reads assembly contains 1 contig(s) contig length: 16616 MITObim has reached a stationary read number after 29 iterations!!
nad6 readpool contains 107667 reads assembly contains 1 contig(s) contig length: 25433 MITObim has reached a stationary read number after 32 iterations!!
cox1+nad6 readpool contains 107683 reads assembly contains 2 contig(s) min contig length: 12445 bp max contig length: 13299 bp avg contig length: 12872 bp MITObim has reached a stationary read number after 20 iterations!!
the cox1+nad6 shows almost same readpool size than nad6.
Secondly, I am also interested in removing the mt reads from initial dataset, to assemble the nuclear genome with mt clean data. I've found a reply you gave in google group:
if you want to extract these reads you can do it e.g. like this (assumes you are in the iterationXX directory):
cat samplename-refname_assembly/samplename-refname_d_info/samplename-refname_info_contigreadlist.txt | grep "#" -v | cut -f 2 > readlist.txtmiraconvert -n readlist.txt samplename-readpool-itxx.fastq iterationXX-reads-used-in-assembly
based on this, how should I consider the SE reads used to reconstruct the mt genome? should I remove the pairs they belong to as well? Thanks in advance
Stefano