vsearch
vsearch copied to clipboard
After combine two datasets together to recluster, some of the abundance of high-abundant OTUs were reduced dramatically.
I have two datasets A and B, where there is a high-abundance OTU (id: OTU_54) in dataset A. In order to compare the abundance of OTU_54 in the two datasets, I put the raw sequencing data of A and B together (=>A+B), followed the example steps provided on the website to cluster (the parameters are the same as when A and B analyzed), and found that the OTU_54 in the original A dataset had very low abundance in the otutab(A+B) produced by the new clustering.
So I blast all.nonchimeras.fasta
(the file before cluster at 97% similarity) of A, B and A+B with OTU_54, and filtered the blast results according to identity > 97%, alignment length>300, and checked the number of matches, and found that A+B lost a lot of OTU_54.
wc -l filt_nonchim* # filtered blast results.
76966 filt_nonchim18.txt #generated from datasetB
157240 filt_nonchim19.txt #generated from datasetA
12369 filt_nonchim.txt #generated from A+B
How can I address or optimize the analysis process? Thanks!
and found that the OTU_54 in the original A dataset had very low abundance in the otutab(A+B) produced by the new clustering.
This is a known downside of using a centroid-based fix-threshold clustering approach: some clusters shrink or disappear when adding more data.
A given centroid1 can be abundant in a sample A, but close to a more abundant centroid2 present in a sample B. If you clusterize A+B, then centroid2 captures some or all the reads initially captured by centroid1.
and checked the number of matches, and found that A+B lost a lot of OTU_54.
If I understand correctly, reads from OTU_54 are not lost, but were re-distributed into other OTUs. There is not much that can be done to mitigate that downside.
Thanks so much!