RabbitTClust icon indicating copy to clipboard operation
RabbitTClust copied to clipboard

Discrepancies Found between RabbitTClust and NCBI Clustering Result

Open amyliufda opened this issue 11 months ago • 2 comments

In recent runs against the latest NCBI dataset of Listeria, we've observed large discrepancies between RabbitTClust and NCBI clustering results. Here're a few examples.

  1. When distance threshold < 0.0003, SRR2051098 is clustered with 34 other isolates in NCBI result, https://www.ncbi.nlm.nih.gov/pathogens/tree/#Listeria/PDG000000001.3630/PDS000003342.11?accessions=PDT000066179.2, while RabbitTClust doesn't cluster with the 34 isolates but with some other isolates that are mostly from another NCBI cluster.
  2. When distance threshold < 0.0003, SRR4416146 is clustered with 18 other isolates in NCBI result, https://www.ncbi.nlm.nih.gov/pathogens/tree/#Listeria/PDG000000001.3630/PDS000003335.20?accessions=PDT000151961.2, while in RabbitTClust, it's all by itself without clustering with any other isolates.
  3. When distance threshold >= 0.0003, SRR2051098 is in a big cluster with thousands of other isolates, and SRR4416146 is still by itself.

We understand that different thresholds produce different results, however, seeing such big differences between NCBI and RabbitTClust is not what we have expected. SRR2051098 and SRR4416146 are just two random examples, and there're others like them, too. Is there an explanation why RabbitTClust results are so much off from the NCBI results? Thank you.

amyliufda avatar Mar 21 '24 14:03 amyliufda