rigraph icon indicating copy to clipboard operation
rigraph copied to clipboard

delete_edges bug and counterintuitive add_edges behaviour (and workaround)

Open jessexknight opened this issue 3 years ago • 20 comments

Describe the bug The documented behaviour of delete_edges is not correct, and add_edges does some counterintuitive things.

To reproduce

library('igraph')
set.seed(1)
G = make_ring(10)
E(G)$color='blue'
e = get.edgelist(G) # 10x2 matrix
i = sample(ecount(G),3) # random edges
print(e[i,]) # random edges' vertices
layout = layout_nicely(G) # save layout
plot(G,layout=layout); print(ecount(G))
G = delete_edges(G,apply(e[i,],1,paste0,collapse='|')) # OK
# G = delete_edges(G,c(t(e[i,]))) # incorrect (1)
plot(G,layout=layout); print(ecount(G))
# G = add_edges(G,apply(e[i,],1,paste0,collapse='|'),color='red') # error (2)
G = add_edges(G,c(t(e[i,])),color='red') # OK
plot(G,layout=layout); print(ecount(G))

correctly yields:

     [,1] [,2]
[1,]    9   10
[2,]    4    5
[3,]    7    8
[1] 10
[1] 7
[1] 10

However, the commented uses of delete_edges (1) and add_edges (2) do not work:

  1. delete_edges with the numeric argument removes twice as many edges as described in docs, yielding "10, 4, 7" for the last three outputs instead of "10, 7, 10".
  2. add_edges with the character argument throws an error Invalid vertex name(s). While this input format is not documented, I was expecting symmetry with delete_edges.
  3. Semi-related: it's a bit counterintuitive that passing the matrix get.edgelist directly to add_edges results in the wrong edges, since R iterates over matrix data by row then column. So, the matrix must be transposed (the c() above is actually not needed).

Version information igraph v1.3.2 on linux mint 20.1

jessexknight avatar Jul 16 '22 22:07 jessexknight

The documentation regarding delete_edges is rather (too) terse. It mentions the edgelist, but there are many formats that sometimes work and sometimes don't. Consider for example.

g1 <- graph_from_literal(A-B, B-C, C-D, D-E)
[1] A--B B--C C--D D--E

then

g2 <- delete_edges(g, get.edgelist(g1)) produces error message: Error in as.igraph.es(graph, edges) : Invalid edge names
g2 <- delete_edges(g1, A|B) produces error message            : Error: unexpected symbol in "delete_edges(g, A|B) produces"

g2 <- delete_edges(g1, c(1,2))   removes edge 1 and 2
g2 <- delete_edges(g1, 1|2)      removes edge between vertex 1 and 2
g2 <- delete_edges(g1, E(g1)[1]) removes the first edge

clpippel avatar Jul 17 '22 08:07 clpippel

delete_edges() interprets a numeric vector as a list of edge IDs. This is what "edge sequence" refers to in the documentation:

The edges to remove, specified as an edge sequence.

If you want to specify edges as pairs of vertices, use edges:

> g <- make_ring(5)

> g
IGRAPH c574109 U--- 5 5 -- Ring graph
+ attr: name (g/c), mutual (g/l), circular (g/l)
+ edges from c574109:
[1] 1--2 2--3 3--4 4--5 1--5

> delete_edges(g, edges(2,3, 3,4))
IGRAPH e02888b U--- 5 2 -- Ring graph
+ attr: name (g/c), mutual (g/l), circular (g/l)
+ edges from e02888b:
[1] 1--2 1--5

szhorvat avatar Jul 17 '22 20:07 szhorvat

@szhorvat, why is edge 4--5 deleted ?

clpippel avatar Jul 17 '22 22:07 clpippel

You are right, something seems to be wrong.

szhorvat avatar Jul 18 '22 06:07 szhorvat

It's just confusing. As you mentioned, delete_edges takes a series of edge IDs, not what you might expect namely "The edges to add, a vertex series with an even number of vertices." So a conversion from vertex IDs to edge IDs is required:

delete_edges(g, get.edge.ids(g, c(2,3, 3,4)))

clpippel avatar Jul 18 '22 08:07 clpippel

Another example of confusion: g <- graph(c(), 10) + edges(1:10) - edges(1:10)

clpippel avatar Jul 18 '22 08:07 clpippel

Yet another example:

g1 <- make_ring(7)
dd <- get.edgelist(g1)                   # get edgelist
g2 <- graph(c(), 7)
g2 <- add_edges(g2, dd)                  # and put it back 
stopifnot(isomorphic(g1, g2))            # surprise

[1] 1--2 2--3 3--4 4--5 5--6 6--7 1--7   # g1
[1] 1->2 3->4 5->6 1->2 3->4 5->6 7->7   # g2 

I have no clear solution for the above misunderstandings.

clpippel avatar Jul 18 '22 08:07 clpippel

@jessexknight Statement # G = delete_edges(G,c(t(e[i,]))) # incorrect (1) is indeed incorrect. It needs an extra conversion step:

H2 = delete_edges(G, get.edge.ids(G, c(t(e[i, ])))) # convert vertex series to edge id and delete edges

clpippel avatar Jul 18 '22 10:07 clpippel

There are lots of examples listed above that seem to be a source of confusion; however, it seems like none of the results are actually wrong, it's just that the documentation needs to clarify what should be used and how. I did not implement these functions so I'm also just exploring the options, but for me it seems like the general rule is that the second argument of delete_edges() should be an edge sequence, which is on its surface "just" a vector of numeric edge IDs. The only extra feature I've found is that any edge ID can be substituted with an "edge name", which is a string containing the indices of the two endpoints of the edge, concatenated by "|". E.g., delete_edges(g, c("2|3", "4|5")) deletes edges between vertices 2 and 3, and 4 and 5. I don't know how this treats named vertices or multiple edges between the same vertex pair.

However, there are special objects that can be used in conjunction with the - operator to make your life easier (and I'd say that probably delete_edges() is meant to be a low-level function and the original intent was to let people just use - instead):

  • You can wrap a list in edges() or edge() to declare explicitly that this is a list containing edge IDs, and then you can use something like g - edges(2, 3, 3, 4) and it will delete edges with IDs 2, 3 and 4 (not the edges between vertices 2-3 and 3-4). This is essentially equivalent to delete_edges(), and you can use the "source|target" notation in place of edge IDs here. So, if you want to remove edges between vertices 2-3 and 3-4, you can use edges("2|3", "3|4").
  • You can use path() to select a path of edges by declaring the vertex IDs on the path, e.g., g - path(2, 3, 4) deletes the edges between vertices 2-3 and 3-4.

help("igraph-minus") seems to provide a lot of details about the - operator.

ntamas avatar Jul 18 '22 10:07 ntamas

Also, just to keep the discussion clear, can we list examples here that seem to be working incorrectly, taking into acocunt my explanation above (i.e. this is how it is supposed to work)? The list should contain items where each item states what the graph looks like, what the edge deletion call looks like, what is the expected output (based on your understanding of the docs) and what is the observed output. Then we can go through them one by one to see if it is a genuine bug or a flaw in the documentation.

ntamas avatar Jul 18 '22 10:07 ntamas

Example showing that the interpretation of edge is inconsistent between deleting and adding edges. In the former, it is treated as a list of edge IDs. In the latter it is treated as pairs of vertex IDs.

> make_ring(4)
IGRAPH 34e2ca0 U--- 4 4 -- Ring graph
+ attr: name (g/c), mutual (g/l), circular (g/l)
+ edges from 34e2ca0:
[1] 1--2 2--3 3--4 1--4

> make_ring(4) + edge(1,3)
IGRAPH 61f5029 U--- 4 5 -- Ring graph
+ attr: name (g/c), mutual (g/l), circular (g/l)
+ edges from 61f5029:
[1] 1--2 2--3 3--4 1--4 1--3

> make_ring(4) + edge(1,3) - edge(1,3)
IGRAPH 631865e U--- 4 3 -- Ring graph
+ attr: name (g/c), mutual (g/l), circular (g/l)
+ edges from 631865e:
[1] 2--3 1--4 1--3

This is not inconsistent with the documentation of edge, which distinguishes these two cases explicitly:

When adding edges via +, all unnamed arguments of edge (or edges) are concatenated, and then passed to add_edges. They are interpreted as pairs of vertex ids, and an edge will added between each pair. Named arguments will be used as edge attributes for the new edges.

When deleting edges via -, all arguments of edge (or edges) are concatenated via c() and passed to delete_edges.

However, the difference in interpretation between the add and delete cases can be confusing. Personally, I did find it quite confusing. I did not find it surprising that add_edges and delete_edges take different arguments. But I was surprised that the same edges(1,3) was interpreted in an entirely different manner in two different contexts.

szhorvat avatar Jul 18 '22 10:07 szhorvat

In short, my personal misunderstanding was that edge represents a collection of edges in a context-independent manner. It does not. Instead, it merely disambiguates whether we are adding edges, vertices, or something else with the + / - operators. This disambiguation is the only function it provides. Does this sound right @ntamas ?

szhorvat avatar Jul 18 '22 10:07 szhorvat

@ntamas, You can use path() to select a path of edges by declaring the vertex IDs on the path, e.g., g - path(2, 3, 4) deletes the edges between vertices 2-3 and 3-4.

This is probably beyond the scope of this bug report. What I'm missing is the ability to remove edges using a "vertex sequence with even number of vertices". For example, to delete edges between 2,3 and 4,7: g - vseq( 1,2, 4, 7), where vseq returns a special object with class "igraph.even.vertex.seq" . Assuming it does not exists already.

Note: The path parameter is also implemented in E(g).

clpippel avatar Jul 18 '22 11:07 clpippel

Note that one serious difficulty with referring to edges through their endpoints is the handling of multigraphs. If we delete "(1,2)" from make_graph(c(1,2, 1,2)), will it delete a single edge or both edges connecting 1 and 2?

szhorvat avatar Jul 18 '22 11:07 szhorvat

@szhorvat, use the same logic as in path, e.g. delete one edge. See E {igraph} doc.

g <- graph_from_literal(A-B, A-B, simplify=FALSE)
g - path("A", "B")
[1] A--B

clpippel avatar Jul 18 '22 11:07 clpippel

@ntamas, Confusion can be reduced by adding more examples. The function delete_edges accepts objects with class="igraph.edge". An example to demonstrate this capability can be included:

g1 <- make_ring(7)
es <- E(g1)[1]
make_ring(8)|> delete_edges(edges(es)) -> g
g

clpippel avatar Jul 18 '22 11:07 clpippel

Another example:

## delete edges by vertex sequence
g1 <- make_ring(5)
g2 <- delete_edges(g1, get.edge.ids(g1, c(1,5, 4,5)))
g2

clpippel avatar Jul 18 '22 12:07 clpippel

Notes to my future self when I finally find some time to work on this:

  • edges() is an alias to edge()
  • edge() creates a special igraph.edge structure that is handled in its own branch in the +.igraph and -.igraph functions in R/operators.R
  • If we want to implement something like vseq() proposed above (is this the best name? How about vpairs()?), we should create an igraph.vertex.pairs structure similarly to igraph.edge, and add the appropriate handler branches in +.igraph and -.igraph

Until then, I'm adding an example with get.edge.ids() to delete_edges().

ntamas avatar Aug 05 '22 10:08 ntamas

I presume vpairs() is definitely less unambiguous then vseq().

I would like to add two examples to get.edge.ids. However I need a hint how / where to find the source for a pull request.

## multiple edges
## multi = FALSE, a single edge id is returned,
## as many times as corresponding pairs in the vertex series.
g <-  make_graph(rep(c(1,2), 5))
eis <- get.edge.ids(g, c(1,2, 1,2), multi=FALSE)
eis
E(g)[eis]

## multi = TRUE, as many different edges, if any,
## are returned as pairs in the vertex series.
eim <- get.edge.ids(g, c(1,2, 1,2, 1,2), multi=TRUE)
eim
E(g)[eim]

clpippel avatar Aug 06 '22 07:08 clpippel

First of all, you'll need to look at the dev branch because the master branch is auto-generated from the dev branch after each successful CI run.

The source of get.edge.ids is in R/interface.R and the docstring should be right above it.

ntamas avatar Aug 07 '22 20:08 ntamas