wrangling-genomics icon indicating copy to clipboard operation
wrangling-genomics copied to clipboard

Assessing Read Quality: Improve the visualisation of Phred Quality Scores

Open theheking opened this issue 3 years ago • 1 comments

I think the Details on the FASTQ format section is fairly confusing for beginners. For a better readibility and understanding of the quality scores. I think that the sentence "This quality score is logarithmically based, so a quality score of 10 reflects a base call accuracy of 90%, but a quality score of 20 reflects a base call accuracy of 99%" should be paired with the classic table displaying Quality Score.

Quality Score Probability of Base Error Base Confidence Sanger Encoded ASCII Character
10 0.1 90% "+"
20 0.01 99% "5"
30 0.001 99.9% "?"
40 0.0001 99.99% "I"

theheking avatar Apr 16 '21 13:04 theheking

Agreed. Also, with the increasing use of NovaSeq as a sequencing platform, it would be good to include an explanation of NovaSeq quality scores. The quality scores are binned and correspond to marginal (<Q15, reported value of 12), medium (~Q20, reported value of 23), high (>Q30, reported value of 37), and a null score for no-calls is reported as 2 (https://www.illumina.com/content/dam/illumina-marketing/documents/products/appnotes/novaseq-hiseq-q30-app-note-770-2017-010.pdf)

amishaporet avatar Dec 01 '22 19:12 amishaporet