dada2
dada2 copied to clipboard
How can I check if the primers have already been removed from the sequences?
Hello!
I am a rookie on this and not so sure about some procedures... please: How can I check if the primers have already been removed from the sequences? Let me explain:
I have received a dataset of MiSeq sequenced microbiomes (16s) for analysis and I am not sure if the primers have already been removed from these fastq files. I have tried to manually open the fastq files and search for the primers sequences but cutting and pasting the primer sequence in the 'search tool', but I am not sure if this approach is correct.
Could you help me with that please?
Thanks in advance.
Have you seen our ITS tutorial workflow, in particular the section on identifying the presence of primers? https://benjjneb.github.io/dada2/ITS_workflow.html#identify-primers
That should help. That said, often a visual inspection is sufficient, just open up a fastq file and look for the primer sequence at the start of the reads (taking into account that many/most primers have some ambiguous nuclotides in them).
I use this to check for tha primer at the start of my reads:
temp_rev_match <- vmatchPattern(its1_forw_primer,
temp_reverse_reads@sread %>% subseq(., start=1, end=nchar(its1_forw_primer)+1),
max.mismatch = 3, fixed = F) %>% startIndex
temp_rev_match <- lapply(temp_rev_match, function (x) ifelse(length (x) > 0, x, NA)) %>% unlist