racon icon indicating copy to clipboard operation
racon copied to clipboard

BUSCO score reduces after racon polish step

Open emmannaemeka opened this issue 4 years ago • 9 comments

Hello I noticed something strange after running Pilon twice on the sequence, the BUSCO score reduced from 93.6%(Obtained after Pilon 2 runs) to 76.4% when I ran Racon on the second Pilon polish.

What's the ideal polishing protocol using Illumina. Should One polish first with long reads(Racon) then short reads(racon) and then finally with Pilon?

Thanks

emmannaemeka avatar Mar 02 '20 20:03 emmannaemeka

Hello, can you please paste the Racon command (+bwa/minimap2) you used after two iterations of Pilon?

Best regards, Robert

rvaser avatar Mar 02 '20 21:03 rvaser

minimap2 -ax map-ont -t 28 ~/_pilon_2x.fa ~/long_read.fq > ~/racon1x.sam

/racon -m 8 -x -6 -g -8 -w 500 -t 30 ~/long_read.fq ~/racon1x.sam ~/_pilon_2x.fa > flye_22_09_19.fa

emmannaemeka avatar Mar 03 '20 04:03 emmannaemeka

Its surprising why it happens something similar was reported Because both polishing techniques alone failed to achieve BUSCO scores equal to or better than the published reference genomes, we then polished using a combination of both Racon and Pilon. We first attempted to run Pilon and Racon in combination, one after the other (e.g., Racon, Pilon, Racon, Pilon, etc.), but found that while BUSCO scores improved with each iteration of Pilon, they then fell with each iteration of Racon

Miller, D. E., Staber, C., Zeitlinger, J., & Hawley, R. S. (2018). Highly Contiguous Genome Assemblies of 15 Drosophila Species Generated Using Nanopore Sequencing. G3 (Bethesda, Md.), 8(10), 3131–3141. https://doi.org/10.1534/g3.118.200160

emmannaemeka avatar Mar 03 '20 04:03 emmannaemeka

Well the reason is that you are using long reads in Racon rounds and short reads in Pilon rounds. If you use erroneous long reads after accurate short reads, you will get lower accuracy and thus lower BUSCO scores. You need to first use Racon to polish the assembly with long reads, and afterwards you can use any combination of Racon and Pilon with short reads to further increase the accuracy.

rvaser avatar Mar 03 '20 12:03 rvaser

Hello!!!!

I have the same problem, but in my case, I am using only long reads to polish with racon. I used to assembly canu and Wtdbg2, in both of them happens the same.

dtusso2020 avatar Dec 12 '20 00:12 dtusso2020

Hi @jforero2020, how much does the BUSCO score decrease? Which sequencing technology reads do you have? Which mapper did you use?

Best regards, Robert

rvaser avatar Dec 12 '20 02:12 rvaser

Hi @rvaser

It decreases from 98% to 69,7 %, the technology is PacBio RSII and I am using minimap2 for mapping.

dtusso2020 avatar Dec 12 '20 04:12 dtusso2020

Were the assemblies polished with anything else in between?

rvaser avatar Dec 12 '20 14:12 rvaser

I have a similar situation. I tested several assemblers (canu, flye, smartdenovo and necat) with my nanopore reads. Then I polished the assemblies with 10 rounds of minimap2/racon with and without trimming. With trimming I lost telomeres and the busco scores improved substantially cp $Assembly current-assembly.fa for i in $(seq 1 $Iterations); do echo "Iteration - $i" minimap2 -x map-ont -t 24 current-assembly.fa $Reads > racon_round_$i.reads_mapped.paf racon -t 24 $Reads racon_round_$i.reads_mapped.paf current-assembly.fa > $WorkDir/racon_round_$i.fasta cp racon_round_$i.fasta current-assembly.fa cp racon_round_$i.fasta $CurDir/$OutDir/"$Prefix"_racon_round_$i.fasta done

Without trimming I kept telomeres and the busco scores decreased cp $Assembly current-assembly.fa for i in $(seq 1 $Iterations); do echo "Iteration - $i" minimap2 -x map-ont -t 24 -c current-assembly.fa $Reads > racon_round_$i.reads_mapped.paf racon -t 24 --no-trimming $Reads racon_round_$i.reads_mapped.paf current-assembly.fa > $WorkDir/racon_round_$i.fasta cp racon_round_$i.fasta current-assembly.fa cp racon_round_$i.fasta $CurDir/$OutDir/"$Prefix"_racon_round_$i.fasta done

But after polishing with the racon polished reads with and without trimming with medaka and pilon, the busco scores are better than the reference genome.

giriarteS avatar Jun 13 '22 22:06 giriarteS