dada2 icon indicating copy to clipboard operation
dada2 copied to clipboard

Bad quality forward and reverse sequences

Open darjniki opened this issue 2 years ago • 1 comments

Hi all,

I am analyzing data from fecal sequencing of 16S rRNA V1-V2, primers: 27F and 338R (MiSeq 2x300 bp).

According to quality graphs, the sequencing didn't go well, especially for reverse sequences:

1

According to quality graphs, I decided to cut first and last 50 nt from forward sequences and first and last 100 nt from reverse sequences (trimLeft=c(50,100), trimRight = c(50,100)). Since I read that it is critical at this stage to remove as many poor-quality nucleotides as possible (but so that at least 10 percent of the sequences remain after filtering).

Next, I tried truncQ = 2 and = 5; since there was not much difference in terms of the left reads, I settled on higher score - 5.

After that, I tried several maxEE from 2 to 6 (for the reverse sequence always higher score). As a result, I settled on maxEE = c (4.5).

2

Forward:

3

Reverse:

4

After first filtation about 48% were left; after merging about 30% of reads were left. Final table:

5

Do you think it is possible to work further with such data? Is there anything else I could do with this data?

I read that in some cases people work only with forward sequences. But in this case, the forward sequences did not go very well ...

Thank you in advance for your help.

darjniki avatar May 03 '22 13:05 darjniki

I read that in some cases people work only with forward sequences. But in this case, the forward sequences did not go very well ...

Given what you've shown, that would still be my recommendation. Stick with the somewhat higher quality forward sequences costs a bit in terms of taxonomic resolution (although probably barely perceptible there), but is a better bet with data like this to keep effective sequencing depth and avoid issues with merging biases driven by the lower-quality reverse reads.

benjjneb avatar May 03 '22 13:05 benjjneb