phyr icon indicating copy to clipboard operation
phyr copied to clipboard

Memory efficient version of phylogenetic diversity metrics

Open rdinnager opened this issue 7 years ago • 5 comments

A PSV/PSE/PSC/etc version that can handle really big phylogenies. In the past I have tried to calculate PSV on a phylogeny with several hundred thousand tips, but R will give a 'cannot allocate vector of size 150 GB', or some other ridiculously large value in this case (presumably because it is trying to allocate a huge phylogenetic covariance matrix). This data is not so unusual anymore, with large metagenomics data, so I think a memory efficient version would be really useful. I was think it could be done using the bigmemory and bigalgebra packages?

rdinnager avatar Feb 22 '18 23:02 rdinnager

Thanks @rdinnager for the issue!

I will update PSV later with c++, hopefully c++ will manage memory better. After then, I will test it with large phylogeny and see what do we need to handle such large trees.

daijiang avatar Feb 23 '18 15:02 daijiang

Okay, that sounds like a good plan.

rdinnager avatar Feb 26 '18 23:02 rdinnager

Hi @rdinnager , I updated psv with c++. It is now faster than picante::psv. But I am not sure whether it can handle several hundred thousand tips (probably not). The main bottleneck is the memory needed to store the species by species phylogenetic var-cov matrix for such many tips...

daijiang avatar May 11 '18 14:05 daijiang

Hey @daijiang @rdinnager , I'd recommend big.memory since it's pretty simple to interface with using Rcpp (see here) and because it allows you to store matrices on disk. The latter is pretty important bc even a direct C++ implementation with no copying of such large matrices will deplete RAM on most computers. I've played around with it, and it seemed pretty intuitive.

lucasnell avatar Jun 11 '18 13:06 lucasnell

Thanks @lucasnell . I will take a look at it later. Currently, the c++ version can handle 20k by 20k matrix on my laptop. It is probably enough for most ecological studies. Big.memory is definitely useful beyond this number.

daijiang avatar Jun 11 '18 13:06 daijiang