dada2 icon indicating copy to clipboard operation
dada2 copied to clipboard

How can I check if the primers have already been removed from the sequences?

Open otaviolovison opened this issue 2 years ago • 2 comments

Hello!

I am a rookie on this and not so sure about some procedures... please: How can I check if the primers have already been removed from the sequences? Let me explain:

I have received a dataset of MiSeq sequenced microbiomes (16s) for analysis and I am not sure if the primers have already been removed from these fastq files. I have tried to manually open the fastq files and search for the primers sequences but cutting and pasting the primer sequence in the 'search tool', but I am not sure if this approach is correct.

Could you help me with that please?

Thanks in advance.

otaviolovison avatar Apr 08 '22 18:04 otaviolovison

Have you seen our ITS tutorial workflow, in particular the section on identifying the presence of primers? https://benjjneb.github.io/dada2/ITS_workflow.html#identify-primers

That should help. That said, often a visual inspection is sufficient, just open up a fastq file and look for the primer sequence at the start of the reads (taking into account that many/most primers have some ambiguous nuclotides in them).

benjjneb avatar Apr 09 '22 20:04 benjjneb

I use this to check for tha primer at the start of my reads:

 temp_rev_match <- vmatchPattern(its1_forw_primer,
                                   temp_reverse_reads@sread %>% subseq(., start=1, end=nchar(its1_forw_primer)+1),
                                   max.mismatch = 3, fixed = F) %>% startIndex
temp_rev_match <- lapply(temp_rev_match, function (x) ifelse(length (x) > 0, x, NA)) %>% unlist

Andreas-Bio avatar Apr 14 '22 11:04 Andreas-Bio