wrangling-genomics
wrangling-genomics copied to clipboard
Assessing Read Quality: Improve the visualisation of Phred Quality Scores
I think the Details on the FASTQ format section is fairly confusing for beginners. For a better readibility and understanding of the quality scores. I think that the sentence "This quality score is logarithmically based, so a quality score of 10 reflects a base call accuracy of 90%, but a quality score of 20 reflects a base call accuracy of 99%" should be paired with the classic table displaying Quality Score.
Quality Score | Probability of Base Error | Base Confidence | Sanger Encoded ASCII Character |
---|---|---|---|
10 | 0.1 | 90% | "+" |
20 | 0.01 | 99% | "5" |
30 | 0.001 | 99.9% | "?" |
40 | 0.0001 | 99.99% | "I" |
Agreed. Also, with the increasing use of NovaSeq as a sequencing platform, it would be good to include an explanation of NovaSeq quality scores. The quality scores are binned and correspond to marginal (<Q15, reported value of 12), medium (~Q20, reported value of 23), high (>Q30, reported value of 37), and a null score for no-calls is reported as 2 (https://www.illumina.com/content/dam/illumina-marketing/documents/products/appnotes/novaseq-hiseq-q30-app-note-770-2017-010.pdf)