DendroPy icon indicating copy to clipboard operation
DendroPy copied to clipboard

PhylogeneticDistanceMatrix.distances: return symmetric distances

Open nick-youngblut opened this issue 5 years ago • 4 comments

It would be helpful if the user could select PhylogeneticDistanceMatrix.distances(full=True) in order to get back a vector or matrix of symmetric distances instead of just the lower triangle (lacking the diagonal)

nick-youngblut avatar Dec 16 '20 09:12 nick-youngblut

Right now, this method returns a list of distances.

If implemented, you would want it to return the concatenation of this list and [0] * n, where n = number of taxa?

jeetsukumaran avatar Dec 17 '20 07:12 jeetsukumaran

I guess that the user can just make the symmetric matrix via:

taxa = t.taxon_namespace
np.array([pdc(t1,t2) for t2 in taxa for t1 in taxa]).reshape(len(taxa), len(taxa))

...but it would be nice to have a simpler method. At least for me, I wanted a symmetric matrix (as shown above) that I could feed to scikit-learn for clustering.

nick-youngblut avatar Dec 17 '20 13:12 nick-youngblut

Fair enough.

But again, what would be the expected return value of this method with this option (given that DendroPy does not require or use NumPy)?

jeetsukumaran avatar Dec 17 '20 13:12 jeetsukumaran

hmm... without the numpy requirement, the user would have to convert to an array, such as via:

numpy.array([numpy.array(xi) for xi in x])

...which is nearly as much work as:

taxa = t.taxon_namespace
np.array([pdc(t1,t2) for t2 in taxa for t1 in taxa]).reshape(len(taxa), len(taxa))

...so maybe such a feature would not actually be that helpful

nick-youngblut avatar Dec 17 '20 14:12 nick-youngblut