racon more variations called after two rounds of ont reads polishing

Hello,

Thank you very much for developing such a great polishing tool. I have used Racon to polish my raw contigs (assembled using raw ont reads) twice with ont reads. Then I called variations for both raw contigs and ont-polished contigs using Illumina reads. I found there are more variations in the polished contigs, although the total variation length is a little smaller. Is this normal?

As follows: ont-polised: racon two rounds with ont reads; illumina-polished: pillon three rounds with illumina reads

Thank you very much for your kind help.

Best regards, Chengcheng

Aug 06 '19 10:08 cai1991

Hi Chengcheng, what is the coverage of your dataset? You used a reference for variation calling or the Illumina reads?

Best regards, Robert

Aug 07 '19 07:08 rvaser

Hi, Robert,

The coverage of my ont reads is ~66X. I mapped illumina reads to the assembled contigs to call variations with GATK. No reference for variation calling. The coverage of illumina reads is ~80X, which was also used in pilon polishing.

Best regards, Chengcheng

Aug 07 '19 08:08 cai1991

Can you please check what is the average quality of the ont reads?

Aug 07 '19 08:08 rvaser

We generated these ont reads from 3 flowcells. The mean.q are 9.4, 8.9 and 8.8. I merged these reads together to use.

Best regards, Chengcheng

Aug 07 '19 10:08 cai1991

There is a tiny chance that this is the issue as Racon employs a quality threshold of 10 on each windows. Try running one iteration with parameter -q 8 and try calling variants.

Aug 07 '19 10:08 rvaser

Thank you for the suggestion. I will try it.

Best regards, Chengcheng

Aug 07 '19 11:08 cai1991

Hi, Robert,

I suddenly realize that I used fasta files (both for the sequences and target sequences) for Racon polishing. And the overlaps file is in paf format and was generated by mapping ont reads (also fasta file) to my contigs with minimap2. Will this be a problem? How does Racon obtain quality information in this case?

Best regards, Chengcheng

Aug 07 '19 16:08 cai1991

The fasta file will not be a problem as Racon does not use qualities in this case. I am not sure why there is a minimal difference in variations. The initial assembly was obtained with which assembler?

Aug 08 '19 05:08 rvaser

I used smartdenovo to assemble raw ont reads. It produced very continuous contigs with contig N50 of 9.2Mb and total contig size of 550Mb. I was very satisfied with these assembly statistics. The contig size was very reasonable. And also complete BUSCO of initial assembly was 86.4%. Racon improved it to 90.6% (round1) and 90.4% (round2).

Best regards, Chengcheng

Aug 08 '19 06:08 cai1991

Does smartdenovo employ any accuracy boosting during assembly or is the final error equal to the error in raw reads?

Aug 08 '19 07:08 rvaser

I'm not sure about the details of how this assembler works. But from what I read on their github page, https://github.com/ruanjue/smartdenovo/blob/master/README-tools.md the final consensus sequence is more accurate than raw reads, reaching to 99.7%. But they still suggest to use other tools to improve the accuracy.

Best regards, Chengcheng

Aug 08 '19 08:08 cai1991

Well then I think it is not that surprising that the accuracy changed a little. Maybe you can change different alignment parameters, like 2/-5/-2 or 3/-5/-4. You can also try Racon with Illumina.

Aug 08 '19 08:08 rvaser