ggdendro
ggdendro copied to clipboard
Node stack overflow error in dendro_data
Hi,
We are trying to plot dendrogram but unfortunately for some cases we are getting error in dendro_data function.
hcdata <- dendro_data(hc, type="rectangle")
Error in : node stack overflow No stack trace available
Is there any way i can get rid of this error ?
Thanks, Priyanshu
Please provide a reproducible example
Hi ,
I have attached two sample files.
- mydata1.csv works fine
- mydata.csv gives error as mentioned above
Below is the code in which i read data from csv file
scaledData<-read.csv("mydata1.csv",header=TRUE) dhc <<- hclust(dist(scaledData), method = "average") ddata <- dendro_data(dhc, type = "rectangle")
Yes, I can replicate the problem. I'm not going to be able to investigate any time soon.
Meanwhile you can try:
- Use
plot(dhc)
from the packagerpart()
- Reduce your data set (maybe using sampling) and then plot using
ggdendro
The issue is still present when dealing with large dendrograms .....
Hi, the issue still persists. It is extremely problematic when analyzing genetic data. Any chance it will be resolved soon ?
best, Thomas
Do you get the same problem when using rpart()
?
Sorry for the late reply, was very busy the last months.
Quote: "Do you get the same problem when using rpart() ?"
I am not sure how to use rpart, but will look it up.
The problem is with co-expression network analyses, which typically result in dendrograms with 20.000+ leafs. Plotting the dendrogram with the normal plotting fuction is not a problem. However, I would love to integrate this into ggplot2 since I think this is a really great way to write papers.
The issue is sill present in the latest release.I tried to plot a dendrogram generated by hclust with roughly 14695. I think the problem lies with a recursive function call within the as.dendrogram function.
Correct. Since ggdendro
doesn't do this compuation, but rpart
does, this can't be fixed in ggdendro
, really. If you can demonstrate that rpart
can do the computation but ggdendro
can not, then I will investigate.
Sorry, I said rpart
but of course you're using hclust
. In either case, the dendogram itself is generated by as.dendrogram
in the stats package (part of base R). If the base stats
package is causing the problem, then ggdendro
can not help you solve this, unfortunately.
Still, if you can find a solution using base R, this will demonstrate that the problem lies with ggdendro
, and then I can investigate.
Hi Andrie,
thanks for the prompt reply. Plotting the dendrogram with plot(hclustobject) is no problem at all. The function as.dendrogram works fine.
The culprit is the gg.plotNode function defined in the dendrogram_data function, I believe the recursive function call in line 282-285 of dendrogram.R
I hope that helps. Dendrograms of this size are nothing unusual in -omics data analysis, and it would be really nice to be able to plot dendrograms in ggplot. Plotting them using plot and reimporting the png file is - let's put it this way - suboptimal.
best, and thanks for your work ! Thomas
Hi,
I'm running into the same issue. Is there any update on this issue?
Thanks a lot for your work! Luuk
Hi there, just wanted to bump this issue as I've had problems with it as well. As thommor pointed out, the issue is caused by this recursive call to gg.plotNode()
.
Attached is a plot from plot.hclust()
on an hclust result that causes the issue. Not a very interesting tree, but I'm using dendro_data()
in a package that sometimes runs into examples like this (which is also why I want the lighter dependency footprint of ggdendro vs dendextend). Many leaves branch off from the same node at height = 0, so I think what's going on in this case is the recursive function adds a a new layer to the call stack for each leaf until it hits the limit. The final error message is:
> ggdendro::dendro_data(path_clust)
Error: node stack overflow
Error during wrapup: node stack overflow
Error: no more error handlers available (recursive errors?); invoking 'abort' restart
Github doesn't allow uploading RData files to issues, so I ran the hclust result through dput()
and put it into a txt file.
I appreciate any help, ggdendro is awesome!
This problem happens because there is too much identical branches and the recursive function invoked overflow the limit. A work around is adding a small random value to the original matrix to make sure there are not so many identical rows.
https://support.bioconductor.org/p/125023/
Hi @andrie,
The problem lies with ggdendro
. Creating a dendrogram using base R works just fine. I have tried to look into the problem, and as it was already stated, the problem lies with gg.plotNode
calling itself. Adding noise to the data (as suggested by @crotoc) fixes the problem, but is not really ideal for me. Do you think you can look into this issue anytime soon?
Thank you in advance!
Best, Leon
I have pushed a fix to the main branch on github. Please can you install the latest version from github and let me know if this works on your real world data.
Using the example provided by @GuptaPriyanshu , I now get this plot output:
Re-opening, since there are some performance optimisation gains possible.
The example now runs in ~1.5 seconds on my machine.
Hi @andrie,
Thank you for fixing the issue. The latest GitHub version of ggdendro
now works without returning an error. It runs in 864.351 seconds (about 14.4 minutes) on a dataset composed of 58723 observations.
Best, Leon
I am delighted to finally close this bug, after 6 years. Version 0.2.0 was accepted by CRAN on 2023-02-24.