RaGOO icon indicating copy to clipboard operation
RaGOO copied to clipboard

SPAdes + RaGOO

Open francicco opened this issue 4 years ago • 3 comments

Hi,

I'm setting un a pipeline to assemble short-read data at low coverage using a reference assembly of a very closely related species (GS: ~400Mb). Without going into details I have a step where I assemble the sr dataset with SPAdes. The idea now would be to use RaGOO to sort and assemble those short contigs with the reference. the first question is whether RaGGO is a good fit for it and how to run it. I give it a small test running it like this:

ragoo.py -t 20 deNovo_Superblocks_200.fa Herd.Pilon.Heet.fasta

It finished pretty soon but the result was pretty weird, a very large assembly size (~1 Gb).

Any advice or suggestions?

Thanks a lot Francesco

francicco avatar May 22 '20 10:05 francicco

Hi there,

With respect to the assembly size, the RaGOO scaffolds should just be an ordering and orienting of the input contigs. What was the total assembly size for the contigs? In theory, if there are a lot of contigs (and therefore gaps) the gap size may add up. Maybe you can check to see what percentage of the scaffolds is gap sequence. And it would be helpful to know the general assembly stats for your input assembly.

One shortcoming of RaGOO v1 is that it was really designed for contiguous long read assemblies. RaGOO v2, which will be beta-released in the next few weeks, should be much better suited for fragmented short-read assemblies. When it is released, I will be sure to notify you here and perhaps that will serve you better.

malonge avatar May 26 '20 15:05 malonge

Hi @malonge,

I see your point. I think I also have to be sure that those fragments (contigs) are not still overlapping, which I think it's the case. So far I used Nucmer + Amos, but there are too many fragments and it takes too much and sometimes Amos just can't handle it. Any suggestion?

Thanks, and I'll try RaGOO v2 for sure! F

francicco avatar May 26 '20 15:05 francicco

As a general rule, I would do what the VGP is doing (github). I believe they use a tool called purge_dups.

That said, these methods may not work as well on fragmented short-read assemblies. But perhaps they can point you in the right direction.

Thanks

malonge avatar May 26 '20 15:05 malonge