RagTag
RagTag copied to clipboard
different type of errors in scaffolding
Hello,
I am scaffolding a Hifiasm assembly of a selfed tetraploid plant (~400 Mb assembly, contig N50 11.7 Mb, N90 2.5 Mb) with a reference assembly created by merging the assemblies of the two diploid progenitors (not the actual parental plant, but the same species). So the contigs will go on the reference with a 1:1 ratio.
The command was all default:
ragtag.py scaffold -t 3 -o MUR_H2_ragtag ref_tetraploid.fa contigs.fa
and I notice different behaviors in the output (I am sending you the files separately).
- hal_chr1: the second contig (ptg000024l) is sorted RC on the pseudos
- hal_chr1: towards the 3' end of it, there is a contig that should go on lyr_chr1
- hal_chr1 and hal_chr2: two contigs are split half in chr1 and half in chr2 - we decided to trust this new assembly. No action here
- hal_chr5, 6, 7: the inversions/shuffling are fine
- hal_chr8: concatenated with lyr_chr8 (too much background noise from the homeolog?)
Is there an explanation for placing a contig RC when not needed, or for swapping a contig to the other homeolog? I understand that the cases with a contig going to two chromosomes will confuse the scaffolding as well.
I could be fixing these issues manually editing the agp file, but this would defeat the purpose of RagTag. Is there a way to overcome such clear mis-placements? Thanks,
Dario
Hi Dario,
I think the first thing to try is to use Nucmer. I suggest following the instructions in #48. If you think you need more specificity to distinguish between the homeologs, you can increase the values for -l
and -c
.
That might do the trick. If not, I would be happy to look at the data.
Thanks, Mike
Hi Mike,
I can't run RagTag with nucmer now because other jobs don't leave enough memory on the machine.
But in the meanwhile I realized that I had many haplotigs (even though hifiasm did not label them as such), so I run Purge Haplotigs and it removed many of the very short sequences that you can see align inside a larger contig (here some of them are added to the scaffold by RagTag):
Re-running RagTag with only the Purge Haplotigs' primary contigs and minimap I get a clearer scaffolding pattern
where I would just move the last contig from hal_chr1 to hal_chr2 and reverse complement hal_chr6 (quite arbitrary though)
Can it be that the many hits coming from the short contigs confuse RagTag's algorithm?
Hi there,
Good to see that purging haplotigs helped out. It's hard to say how these smaller contigs may have affected things without digging into things a bit. Usually, alignment is the biggest factor for RagTag, which is why I usually recommend trying Nucmer and Minimap2 for tougher applications.
Theoretically, I would expect haplotigs to appear as false duplications in the ragtag output, but I wouldn't expect any false translocations or inversions, as you described. So that makes me guess that improved alignments account for these improvements. But I could be wrong.
Thanks, Mike