ggdendro icon indicating copy to clipboard operation
ggdendro copied to clipboard

Node stack overflow error in dendro_data

Open GuptaPriyanshu opened this issue 6 years ago • 12 comments

Hi,

We are trying to plot dendrogram but unfortunately for some cases we are getting error in dendro_data function.

hcdata <- dendro_data(hc, type="rectangle")

Error in : node stack overflow No stack trace available

Is there any way i can get rid of this error ?

Thanks, Priyanshu

GuptaPriyanshu avatar Aug 01 '18 18:08 GuptaPriyanshu

Please provide a reproducible example

andrie avatar Aug 01 '18 18:08 andrie

Hi ,

I have attached two sample files.

  1. mydata1.csv works fine
  2. mydata.csv gives error as mentioned above

Below is the code in which i read data from csv file

scaledData<-read.csv("mydata1.csv",header=TRUE) dhc <<- hclust(dist(scaledData), method = "average") ddata <- dendro_data(dhc, type = "rectangle")

files.zip

GuptaPriyanshu avatar Aug 01 '18 21:08 GuptaPriyanshu

Yes, I can replicate the problem. I'm not going to be able to investigate any time soon.

Meanwhile you can try:

  • Use plot(dhc) from the package rpart()
  • Reduce your data set (maybe using sampling) and then plot using ggdendro

andrie avatar Aug 02 '18 01:08 andrie

The issue is still present when dealing with large dendrograms .....

thommohr avatar Nov 28 '19 22:11 thommohr

Hi, the issue still persists. It is extremely problematic when analyzing genetic data. Any chance it will be resolved soon ?

best, Thomas

thommohr avatar Dec 15 '19 20:12 thommohr

Do you get the same problem when using rpart() ?

andrie avatar Dec 16 '19 05:12 andrie

Sorry for the late reply, was very busy the last months.

Quote: "Do you get the same problem when using rpart() ?"

I am not sure how to use rpart, but will look it up.

The problem is with co-expression network analyses, which typically result in dendrograms with 20.000+ leafs. Plotting the dendrogram with the normal plotting fuction is not a problem. However, I would love to integrate this into ggplot2 since I think this is a really great way to write papers.

thommohr avatar Feb 02 '20 19:02 thommohr

The issue is sill present in the latest release.I tried to plot a dendrogram generated by hclust with roughly 14695. I think the problem lies with a recursive function call within the as.dendrogram function.

thommohr avatar Sep 15 '20 17:09 thommohr

Correct. Since ggdendro doesn't do this compuation, but rpart does, this can't be fixed in ggdendro, really. If you can demonstrate that rpart can do the computation but ggdendro can not, then I will investigate.

andrie avatar Sep 15 '20 19:09 andrie

Sorry, I said rpart but of course you're using hclust. In either case, the dendogram itself is generated by as.dendrogram in the stats package (part of base R). If the base stats package is causing the problem, then ggdendro can not help you solve this, unfortunately.

Still, if you can find a solution using base R, this will demonstrate that the problem lies with ggdendro, and then I can investigate.

andrie avatar Sep 15 '20 19:09 andrie

Hi Andrie,

thanks for the prompt reply. Plotting the dendrogram with plot(hclustobject) is no problem at all. The function as.dendrogram works fine.

The culprit is the gg.plotNode function defined in the dendrogram_data function, I believe the recursive function call in line 282-285 of dendrogram.R

I hope that helps. Dendrograms of this size are nothing unusual in -omics data analysis, and it would be really nice to be able to plot dendrograms in ggplot. Plotting them using plot and reimporting the png file is - let's put it this way - suboptimal.

best, and thanks for your work ! Thomas

thommohr avatar Sep 15 '20 21:09 thommohr

Hi,

I'm running into the same issue. Is there any update on this issue?

Thanks a lot for your work! Luuk

ljwharbers avatar May 14 '21 15:05 ljwharbers

Hi there, just wanted to bump this issue as I've had problems with it as well. As thommor pointed out, the issue is caused by this recursive call to gg.plotNode().

Attached is a plot from plot.hclust() on an hclust result that causes the issue. Not a very interesting tree, but I'm using dendro_data() in a package that sometimes runs into examples like this (which is also why I want the lighter dependency footprint of ggdendro vs dendextend). Many leaves branch off from the same node at height = 0, so I think what's going on in this case is the recursive function adds a a new layer to the call stack for each leaf until it hits the limit. The final error message is:

> ggdendro::dendro_data(path_clust)
Error: node stack overflow
Error during wrapup: node stack overflow
Error: no more error handlers available (recursive errors?); invoking 'abort' restart

Github doesn't allow uploading RData files to issues, so I ran the hclust result through dput() and put it into a txt file.

I appreciate any help, ggdendro is awesome!

path_clust_dput.txt

dendro_data_clust

andrewGhazi avatar Jan 24 '23 19:01 andrewGhazi

This problem happens because there is too much identical branches and the recursive function invoked overflow the limit. A work around is adding a small random value to the original matrix to make sure there are not so many identical rows.

https://support.bioconductor.org/p/125023/

crotoc avatar May 15 '23 21:05 crotoc

Hi @andrie,

The problem lies with ggdendro. Creating a dendrogram using base R works just fine. I have tried to look into the problem, and as it was already stated, the problem lies with gg.plotNode calling itself. Adding noise to the data (as suggested by @crotoc) fixes the problem, but is not really ideal for me. Do you think you can look into this issue anytime soon?

Thank you in advance!

Best, Leon

leonfodoulian avatar Sep 28 '23 13:09 leonfodoulian

I have pushed a fix to the main branch on github. Please can you install the latest version from github and let me know if this works on your real world data.

Using the example provided by @GuptaPriyanshu , I now get this plot output:

image

andrie avatar Sep 30 '23 14:09 andrie

Re-opening, since there are some performance optimisation gains possible.

andrie avatar Oct 01 '23 13:10 andrie

The example now runs in ~1.5 seconds on my machine.

andrie avatar Oct 01 '23 14:10 andrie

Hi @andrie,

Thank you for fixing the issue. The latest GitHub version of ggdendro now works without returning an error. It runs in 864.351 seconds (about 14.4 minutes) on a dataset composed of 58723 observations.

Best, Leon

leonfodoulian avatar Oct 02 '23 15:10 leonfodoulian

I am delighted to finally close this bug, after 6 years. Version 0.2.0 was accepted by CRAN on 2023-02-24.

andrie avatar Feb 23 '24 21:02 andrie