speedyseq icon indicating copy to clipboard operation
speedyseq copied to clipboard

Compare distance and clustering methods for tip_glom

Open mikemc opened this issue 4 years ago • 1 comments

In phyloseq::tip_glom, the line

  dd <- as.dist(ape::cophenetic.phylo(phy_tree(physeq)))

is very slow and memory intensive for large datasets (with 10K+ taxa). Need to find out if this is because of the call to ape, and see if there are better alternatives e.g. in the castor package.

By default, phyloseq uses cluster::agnes for the clustering, which seems to be much slower than 'stats::hclust'. Should look into this more, and compare to DECIPHER::IdClusters.

mikemc avatar Mar 02 '20 21:03 mikemc