eulerr icon indicating copy to clipboard operation
eulerr copied to clipboard

very slow, Doesn't work with large datasets

Open adminy opened this issue 2 years ago • 6 comments

library(eulerr)


fit <- euler(c( "A" = 55, "B" = 810, "C" = 102, "D" = 364, "E" = 101, "F" = 24,
            "G" = 34, "H" = 61, "I" = 194, "J" = 107, "K" = 53, "L" = 75, "M" = 11,
            "N" = 65, "O" = 16, "P" = 82, "Q" = 13, "F&O" = 5, "G&O" = 5, "F&G" = 5,
            "D&I" = 47, "D&E" = 33, "K&L" = 9, "A&K" = 7, "K&N" = 7, "A&N" = 7, "A&L" = 7,
            "K&P" = 7, "A&P" = 7, "L&N" = 7, "N&P" = 7, "C&K" = 7, "A&C" = 7, "L&P" = 7,
            "E&G" = 6, "J&K" = 7, "A&J" = 7, "C&O" = 5, "G&H" = 4, "C&N" = 7, "J&N" = 7,
            "E&F" = 5, "C&F" = 5, "C&L" = 7, "J&L" = 7, "C&P" = 7, "J&P" = 7, "C&G" = 5,
            "D&K" = 15, "C&J" = 7, "A&D" = 12, "D&L" = 12, "E&O" = 3, "E&H" = 4, "C&I" = 6,
            "C&D" = 8, "D&H" = 7, "D&N" = 7, "D&P" = 7, "D&J" = 7, "C&E" = 3, "H&O" = 1,
            "B&D" = 15, "B&C" = 11, "F&H" = 1, "E&M" = 1, "B&E" = 8, "B&K" = 7, "I&K" = 2,
            "A&B" = 7, "B&N" = 7, "B&L" = 7, "B&P" = 7, "D&F" = 3, "B&J" = 7, "I&L" = 2,
            "C&H" = 1, "D&O" = 1, "D&G" = 1),
        shape = "ellipse")

# ^^^^^^^ takes forever
fit$stress
fit$diagError

plot(fit)

Meanwhile the browser Venn.js is instant, just less accurate.

Can there be some technique applied to speed it up or at least specify a stopping point for accuracy?

Thanks

adminy avatar Jan 07 '22 01:01 adminy

Hm, yes I agree that there should be an option to set tolerance and maximum number of iterations.

It looks like an easy fix too. I'm happy to consider pull requests if you or anyone else wants to contribute.

jolars avatar Jan 07 '22 07:01 jolars

I was going to parametrise it but I noticed my N is 131071. its stuck at getting stats:

stats::nlm(f = optim_final_loss,
                                 p = pars,
                                 areas = areas_disjoint,
                                 circle = circle,
                                 iterlim = 1e6)$estimate

Even with itterlim to 1 its still taking hours.

Either the algorithm is slow or there is a bug. Also I don't see how my N is so large with just 28 nodes.

adminy avatar Jan 07 '22 14:01 adminy

28 is not a small number, and I'm not actually sure that it makes sense to even try to use an Euler diagram here since the output is bound to be widely misleading. Finding good approximate Euler diagrams is hard even at 4 sets.

The problem is that eulerr tries to compute the areas of the 2^28 - 1 = 268435455 possible intersections in the diagram, which is obviously not going to end well.

It sounds like venn.js is doing something differently since it's as you say much faster, but I'm not really sure how it's possible to avoid considering all the intersections.

jolars avatar Jan 07 '22 15:01 jolars

Or, hm, actually it should absolutely be possible to alleviate this problem by ruling out some of the possible intersections, i.e. if A is not intersecting with C, then obviously A&C&whatever will also be empty, so yeah, this should be possible to fix (at least partially).

jolars avatar Jan 07 '22 15:01 jolars

Agreed. Its not the amount of nodes that matter, its the overlaps.

I was thinking to do something like start off with just one overlap and keep increasing the overlaps until I find the outlier overlap that's causing the algorithm to stress over finding a solution.

adminy avatar Jan 07 '22 16:01 adminy

I'm having a similar issue, any solution?

csijcs avatar Apr 06 '22 12:04 csijcs