phangorn
phangorn copied to clipboard
Problem when plotting network out of big data set
I have a fasta file with 844 sequences of length 14844 bp. My attempts to plot a simple network have been unsuccessful due to the following error.
library(phangorn)
alignment <- read.phyDat(file="myalignment.fasta",format="fasta",type="DNA")
dm <- dist.hamming(alignment)
nnet <- neighborNet(dm)
Error in numeric(max(p)) : vector size cannot be NA/NaN
In addition: Warning message:
In splits2design(x) :
integer overflow in 'cumsum'; use 'cumsum(as.numeric(.))'
I have tried both in Windows and Linux with up-to-date version of R. Is there a known explanation for this error? Is my data set to big?
On the other hand, are there plans to include a function that calculates Median Joining Networks as described in Bandelt et. al. (1999)?
Hi @jorgeamaya, so far neighborNet is not yet working for large datasets, you have to use Splitstree for now. I am right now rewriting some the functions to make neighborNet it work for larger networks. I will let you know when I have something working. I have not yet looked into Median Joining Networks, maybe I give it a try if it is seems easy to implement. Cheers, Klaus
Hello, I'm having the same problem that jorgeamaya because my nexus file is 1016 taxa by 324 pb.
Fortunately, I found this post and I wonder if already exists a correction to work with big datasets. If not, how many taxa handle neighborNet() function?
Thanks in advanced.
Bump! This would be super cool.
Hi all,
so far the neighborNet
algorithm is a naive O(n⁴) implementation. This probably explains that it works until around 100 tips and soon after one runs of of memory. I am working over the summer to get the memory consumption down to O(n³) or a bit below, so that it scales up to a 1000 taxa.