FastQC
FastQC copied to clipboard
Duplicated sequences in contaminant_list.txt
I think it's a bit confusing that many sequences in the list are duplicated, for example AATGATACGGCGACCACCGACAGGTTCAGAGTTCTACAGTCCGA
is listed 5 times, with different names:
- Illumina DpnII expression PCR Primer 2
- Illumina DpnII Gex PCR Primer 2
- Illumina NlaIII expression PCR Primer 2
- Illumina NlaIII Gex PCR Primer 2
- Illumina Small RNA PCR Primer 2
As I understand, the output FastQC report will display only the first occurrence in the "Overrepresented sequences" table?
The set of sequences we have there are a bit haphazard I'm afraid. We can't get a definitive list from ther original vendors as although they will supply them you're required to agree to a license which would mean we couldn't distribute them with FastQC, so we've built up a collection based on user submissions.
It's absolutely possible that the exact same sequence appears in multiple kits under different names, and yes, it will only be the first instance which is reported for a given hit.
We definitely welcome any corrections or clean ups to the lists we have so please do submit a pull request if you have an improved version of the current contmainats file.