Digraphs icon indicating copy to clipboard operation
Digraphs copied to clipboard

The handling of edge labels is sometimes very slow

Open wilfwilson opened this issue 4 years ago • 4 comments

UPDATE: Although in #509 I have resolved the specific problem highlighted in this issue, there are still ideas in this issue thread that I think should be addressed. Namely, other operations may have similar problems, and moreover, having to exclude multidigraphs is currently a massive time sink, and should be mitigated as much as possible.

In my opinion, adding edges to a digraph one-by-one is too slow (even to a mutable digraph, which we are starting to do more often, especially with our standard examples; this was even the point of adding mutable digraphs).

For example, let's use this method to try to construct a chain digraph with only 100,000 vertices and therefore 99,999 edges:

gap> n := 10 ^ 5;; D1 := EmptyDigraph(IsMutableDigraph, n);;
gap> for i in [1 .. n - 1] do DigraphAddEdge(D1, i, i + 1); od; time;
280944

Nearly 5 minutes is way too long! By contrast, doing this with the ChainDigraph function takes very little time:

gap> D2 := ChainDigraph(n);; time;
17
gap> D1 = D2;
true

When I interrupt the slow computation, I always get a backtrace that looks like this, so I presume that the slow part of adding the edge is the accounting to do with edge labels:

^CError, user interrupt in
  p := Position( OutNeighboursOfVertex( D, v ), w )
 ; at /Users/Wilf/gap/pkg/digraphs/gap/labels.gi:96 called from
SetDigraphEdgeLabel( D, src, ran, 1
 ); at /Users/Wilf/gap/pkg/digraphs/gap/oper.gi:188 called from
DigraphAddEdge( D1, i, i + 1 ); at *stdin*

Therefore I think we need a better implementation of edge labels!

Anyway, in this example, I have never explicitly given this digraph any edge labels, so it is frustrating that the package is spending so long worrying about keeping track of the information that every edge has label 1.

wilfwilson avatar Sep 17 '21 12:09 wilfwilson

Two suggestions:

  • We could have a custom method for adding edge labels, which is used when you add an edge. Suppose the out-neighbours of D are called out, and the list of edge labels is labels. Then adding the edge [x,y] to D amounts to adding y to the end of out[x] and adding 1 (the default edge label) to the end of label[x]. This is cheap. No searching needs to be done, and in particular, we can avoid the need for the expensive computation p := Position(OutNeighboursOfVertex(D, v), w).

  • We should be even more 'lazy' about constructing edge labels for a digraph. In particular, the edge labels of a digraph should only be initialised the first time that an edge label is set for that digraph (and even more extreme, that the first time an edge label that is not 1 is set for that digraph), not the first time that an edge label is read. If an edge label needs to be read, and the labels haven't been initialised, then we can return 1 (or a new wrapper, DIGRAPHS_DefaultEdgeLabel, which would be set to 1) without having to first initialise the labels. UPDATE: it seems that there is HaveEdgeLabelsBeenAssigned, handy.

In my above 5 minute example, I never care at all about the edge labels, so it is disappointing that that is what is slowly it down so very much.

wilfwilson avatar Oct 28 '21 09:10 wilfwilson

Ahh, the checking of IsMultiDigraph is also the problem (of course!).

If I get rid of both the check for IsMultiDigraph and the setting of the edge label, I get a 4,000 times speedup:

gap> n := 10 ^ 5;; D1 := EmptyDigraph(IsMutableDigraph, n);;
gap> for i in [1 .. n - 1] do DigraphAddEdge(D1, i, i + 1); od; time;
69

Only checking for IsMultiDigraph but not setting the edge label yields a roughly 2x speedup:

gap> n := 10 ^ 5;; D1 := EmptyDigraph(IsMutableDigraph, n);;
gap> for i in [1 .. n - 1] do DigraphAddEdge(D1, i, i + 1); od; time;
120078

Not checking for IsMultiDigraph (which is currently safe in this case, but not in general) but still setting the edge label has a similar speedup:

gap> n := 10 ^ 5;; D1 := EmptyDigraph(IsMutableDigraph, n);;
gap> for i in [1 .. n - 1] do DigraphAddEdge(D1, i, i + 1); od; time;
152421

Therefore, about half of the time is spent checking for IsMultiDigraph, and about half of the time is spent on the book-keeping for edge labels.

Therefore both of these things should be improved.

Why is the check for IsMultiDigraph even present here? Can it be avoided? Note that in this method, a digraph is a multidigraph after adding an edge if and only if it was either already a multidigraph, or a duplicate edge was just added. Surely this shouldn't need a whole test for IsMultiDigraph? (Note that we're talking about mutable digraphs here, so there are no known attributes).

wilfwilson avatar Oct 28 '21 09:10 wilfwilson

Stale issue message

github-actions[bot] avatar Jan 07 '22 01:01 github-actions[bot]

I'm just observing this now too in an example I am trying to run, I think we need something like HasEdgeLabels so that we can check if they exist and if they don't, then don't try setting them. I may do something about this, time permitting!

james-d-mitchell avatar Apr 05 '22 13:04 james-d-mitchell

Ah wait this already exists and is called HaveEdgeLabelsBeenAssigned, so I guess this is a matter of just using this more.

james-d-mitchell avatar Apr 05 '22 13:04 james-d-mitchell