vsearch After combine two datasets together to recluster, some of the abundance of high-abundant OTUs were reduced dramatically.

After combine two datasets together to recluster, some of the abundance of high-abundant OTUs were reduced dramatically.

Open peiyaohu opened this issue 2 years ago • 2 comments

I have two datasets A and B, where there is a high-abundance OTU (id: OTU_54) in dataset A. In order to compare the abundance of OTU_54 in the two datasets, I put the raw sequencing data of A and B together (=>A+B), followed the example steps provided on the website to cluster (the parameters are the same as when A and B analyzed), and found that the OTU_54 in the original A dataset had very low abundance in the otutab(A+B) produced by the new clustering.

So I blast all.nonchimeras.fasta (the file before cluster at 97% similarity) of A, B and A+B with OTU_54, and filtered the blast results according to identity > 97%, alignment length>300, and checked the number of matches, and found that A+B lost a lot of OTU_54.

wc -l filt_nonchim*               # filtered blast results. 
   76966 filt_nonchim18.txt   #generated from datasetB
  157240 filt_nonchim19.txt  #generated from datasetA
   12369 filt_nonchim.txt       #generated from A+B

How can I address or optimize the analysis process? Thanks!

Feb 28 '23 09:02 peiyaohu

and found that the OTU_54 in the original A dataset had very low abundance in the otutab(A+B) produced by the new clustering.

This is a known downside of using a centroid-based fix-threshold clustering approach: some clusters shrink or disappear when adding more data.

A given centroid1 can be abundant in a sample A, but close to a more abundant centroid2 present in a sample B. If you clusterize A+B, then centroid2 captures some or all the reads initially captured by centroid1.

and checked the number of matches, and found that A+B lost a lot of OTU_54.

If I understand correctly, reads from OTU_54 are not lost, but were re-distributed into other OTUs. There is not much that can be done to mitigate that downside.

Feb 28 '23 14:02 frederic-mahe

Thanks so much!

Mar 01 '23 03:03 peiyaohu

vsearch vsearch copied to clipboard

After combine two datasets together to recluster, some of the abundance of high-abundant OTUs were reduced dramatically.

vsearch
vsearch copied to clipboard