medaka icon indicating copy to clipboard operation
medaka copied to clipboard

Reduced QV score and more missing and duplicates on the busco score

Open cjchen5 opened this issue 2 years ago • 1 comments

Describe the bug After running Medaka I got a reduced QV score and more missing and duplicates on the Busco score. Here is my code for medaka_consensus medaka_consensus -i Ae_RdUgA3_F1_male1_VBI_pass_haplotype-RemaAD.fasta.gz -d necat_polished_contigs_Ae_RdUg-A3_F1_fem2_VBInfc1-4_pass_haplotype-RemaAD.fasta -o medaka_consensus_necat_polished_contigs_Ae_RdUg-A3_F1_fem2_VBInfc1-4_pass_haplotype-RemaAD_normalq -t 128 -m r941_min_hac_g4011

Logging my draft genome: BUSCO: C:96.2%[S:95.1%,D:1.1%],F:2.6%,M:1.2%,n:1013 qv: 28.975

after medaka polish: BUSCO: C:96.6%[S:95.3%,D:1.3%],F:1.3%,M:2.1%,n:1013 qv: 27.371

Environment (if you do not have a GPU, write No GPU):

  • Installation method from conda
  • OS: Red Hat Enterprise Linux Server release 7.7 (Maipo)
  • medaka version 1.4.3
  • GPU model: no GPU
  • Nvidia driver version
  • CUDA version
  • cuDNN version

Additional context I understand this is not the latest medaka v1.4.4 but seems changed and added from v1.4.3 to v1.4.4 won't affect my results. I'm running another one with v1.4.4.

Thanks!

cjchen5 avatar Sep 28 '21 06:09 cjchen5

Hi @cjchen5,

I apologise for the lack of response here. I can give some insight into what may be happening here if it is still useful.

I suspect that this is a case of "over polishing" where iterative realignment of reads to a draft is progressively merging copies of similar sequence in your assembly. This can lead to reduced BUSCO scores when an consensus is constructed from the erroneous alignment of reads to an incorrect copy of a homologous sequence.

This is not an issue that the medaka algorithm in its current form tries to address explicitely --- medaka blindly computes a consensus from the (primary) alignments that it is provided from the aligner.

cjw85 avatar Nov 30 '21 10:11 cjw85