phangorn icon indicating copy to clipboard operation
phangorn copied to clipboard

Problem when plotting network out of big data set

Open jorgeamaya opened this issue 8 years ago • 4 comments

I have a fasta file with 844 sequences of length 14844 bp. My attempts to plot a simple network have been unsuccessful due to the following error.

library(phangorn)
alignment <- read.phyDat(file="myalignment.fasta",format="fasta",type="DNA")
dm <- dist.hamming(alignment)
nnet <- neighborNet(dm)
Error in numeric(max(p)) : vector size cannot be NA/NaN
In addition: Warning message:
In splits2design(x) :
  integer overflow in 'cumsum'; use 'cumsum(as.numeric(.))'

I have tried both in Windows and Linux with up-to-date version of R. Is there a known explanation for this error? Is my data set to big?

On the other hand, are there plans to include a function that calculates Median Joining Networks as described in Bandelt et. al. (1999)?

jorgeamaya avatar Dec 15 '16 16:12 jorgeamaya

Hi @jorgeamaya, so far neighborNet is not yet working for large datasets, you have to use Splitstree for now. I am right now rewriting some the functions to make neighborNet it work for larger networks. I will let you know when I have something working. I have not yet looked into Median Joining Networks, maybe I give it a try if it is seems easy to implement. Cheers, Klaus

KlausVigo avatar Dec 15 '16 21:12 KlausVigo

Hello, I'm having the same problem that jorgeamaya because my nexus file is 1016 taxa by 324 pb.

Fortunately, I found this post and I wonder if already exists a correction to work with big datasets. If not, how many taxa handle neighborNet() function?

Thanks in advanced.

ErnestoHuicochea avatar Apr 13 '18 16:04 ErnestoHuicochea

Bump! This would be super cool.

taprs avatar May 23 '23 12:05 taprs

Hi all, so far the neighborNet algorithm is a naive O(n⁴) implementation. This probably explains that it works until around 100 tips and soon after one runs of of memory. I am working over the summer to get the memory consumption down to O(n³) or a bit below, so that it scales up to a 1000 taxa.

KlausVigo avatar Jul 03 '23 13:07 KlausVigo