pomoxis
pomoxis copied to clipboard
Assess_assembly Documentation
It would be really nice if the documentation provided some explanation of the output files. Specifically these two table types.
Percentage Errors
name mean q10 q50 q90 err_ont 1.609% 0.731% 1.197% 3.830% err_bal 1.621% 0.734% 1.204% 3.915% iden 0.406% 0.109% 0.247% 1.152% del 0.434% 0.196% 0.290% 1.017% ins 0.783% 0.379% 0.557% 1.756%
Q Scores
name mean q10 q50 q90 err_ont 17.94 21.36 19.22 14.17 err_bal 17.90 21.34 19.19 14.07 iden 23.91 29.61 26.07 19.39 del 23.63 27.07 25.37 19.93 ins 21.07 24.22 22.54 17.55
"ins" and "del" are straightforward. What are "iden", "err_ont", and "err_bal" errors. How are these q scores being computed and what do they represent in this context?
I have the same questions. If you have any answers, please let me know.
iden
measures the proportion of aligned (non-indel) bases which are "identical" to their reference base; it is the substitution rate.
err_ont
and err_bal
both measure total error (substitutions, insertions, and mismatches) contained within alignments. They differ in the divisor used; the former divides the error count by the alignment length, while the latter divides by the reference span. We use exclusively the former (hence the "ont" suffix), the latter was added as it is preferred by some users.
The Qscores are simply the log transform (-10*log10[1 - p]) of the error rates.
One further question regarding column 3-4 of assm_stats.txt, what is the difference between coverage and ref_coverage, and what do these values indicate? Thanks very much. Great package!
coverage
measures the proportion of the assembly contig that is covered by the alignment, whereas ref_coverage
measures the same for the reference sequence.
Thank you @cjw85, cheers!