Not Circularization
Hi Ryan,
First of all, thank you for creating such amazing tools! I’ve been using unicycler, Trycycler and now trying minipolish for many things, especially for plasmids has been really useful!
Currently, I’m trying to adapt a script for a metagenomic sample, specifically working on circularizing viral contigs. I think Minipolish could be really helpful for this, but I’m running into an issue. After applying the Minipolish pipeline to the reads post-cleaning and depletion (human and bacterial reads), I ended up with only 6 contigs with varying depths (16X, 32X, 33X, 12X, 14X, and 30X). However, the polished GFA doesn’t connect any nodes; it just gives me 6 linear, unconnected contigs. All of them have sizes typical in this virus species.
What’s strange is that when I align these against a database, it seems like they have been assembled in the correct way, as they match the reference with almost 100% coverage. For example, the one that is 14X against the database has 99% of coverage and 95.83% of identity agents the reference OP882566.1(Torque teno virus 16 isolate AF020 ORF1 gene, complete CDs). Do you have any insights into how I could help circularize these contigs?
It is worth mentioning that the protocol was adapted to sequence with nanopore, but giving shorter contigs than supposed to.
Another approach I’m considering is using contigs from different assemblers, as I have many available in my script (using spades, savage, haploflow and megahit in parallel). I’ve tried using Circulator with no success. However, when using Minimap2 (with the first step of Minipolish), I see that the paf file contains some potentially circularized contigs from my FASTA file. But I don't know how to continue working with the paf file. Do you know how I can use the paf file to circularize these FASTA files?
Any guidance you can provide would be greatly appreciated!
Best regards, Florencia Martino
Hi Florencia,
I'm curious is this is a problem with the miniasm assembly or with the Minipolish polishing. Does the miniasm assembly graph (i.e. before running Minipolish) contain 6 unconnected linear sequences? If so, this seems like a miniasm problem, and your best bet might be to try other assemblers (e.g. Flye) to see if they can produce circularised sequences.
If, however, the miniasm graph does have circularised contigs but the post-Minipolish graph does not, then something may be going wrong with Minipolish.
Regarding other approaches for circularising your sequences, Trycycler could be useful, assuming you can generate high-quality assemblies. It circularises sequences by comparing alternative assemblies to each other. For example, if you had 10 alternative assemblies of the same circular viral sequence, each with different starting positions, then Trycycler can create a cleanly circularised (no gap, no overlap) consensus.
Ryan