FastQC
FastQC copied to clipboard
Wrong Illumina TruSeq adapter sequences in contaminant_list.txt?
I have an impression that sequences of the Illumina TruSeq adaptors starting from Index 13 are wrong. I checked in couple sources, including Illumina website, and starting from Index 13 sequences are different, for example
GATCGGAAGAGCACACGTCTGAACTCCAGTCACAGTCAAC TCTCGTATGCCGTCTTCTGCTTG # Index 13, current contaminant_list.txt
GATCGGAAGAGCACACGTCTGAACTCCAGTCACAGTCAACAATCTCGTATGCCGTCTTCTGCTTG # Index 13, Illumina file
GATCGGAAGAGCACACGTCTGAACTCCAGTCACAGTTCCG TCTCGTATGCCGTCTTCTGCTTG # Index 14, current contaminant_list.txt
GATCGGAAGAGCACACGTCTGAACTCCAGTCACAGTTCCGTATCTCGTATGCCGTCTTCTGCTTG # Index 14, Illumina file
(I have added spaces to align easier) You can see that after the 6-letter barcode, there is a 2-letter difference. Additionally, for Index 23 (https://github.com/s-andrews/FastQC/blob/45c9977c69deb0341e086cceec49145211c4dcb0/Configuration/contaminant_list.txt#L114), even the barcode is different: CCACTC in the current contaminant list while GAGTGG in Illumina's list
Overall, maybe it would be helpful to indicate source of information as a comment line before each block of sequences?