rigraph Indexing edge/vertex sequences is prone to name conflict

Describe the bug

When indexing into edge or vertex sequences, attribute names are interpreted specially, as documented: https://igraph.org/r/html/latest/igraph-es-indexing.html and https://igraph.org/r/html/latest/igraph-vs-indexing.html

E.g. if a graph has a name vertex attribute, then V(g)[name == "foo"] is equivalent to V(g)[V(g)$name == "foo"].

However, this makes it unsafe to use any variables within the indexing operator in library code. Library code should work with any graph, with any attribute names. There is always a risk that variable name will conflict with an attribute name.

This happened here: https://github.com/igraph/rigraph/issues/154

And we can find other susceptible uses of the [ by searching the source code. For example:

> g<-make_ring(5)
> g[[1,2]]
[[1]]
+ 1/5 vertex, from 302bf86:
[1] 2

> V(g)$j <- LETTERS[1:5]
> g[[1,2]]
Error in simple_vs_index(x, ii, na_ok) : Unknown vertex selected

This is because j happens to be used in the implementation of [[.igraph.

Another example:

> g<-make_ring(5)
> dfs(g, 1) # works
> V(g)$res <- 1:5
> dfs(g, 1) # now it's broken
Error in res$order : $ operator is invalid for atomic vectors

Version information

1.3.2

Jun 21 '22 07:06 szhorvat

@gaborcsardi This can't be dealt with without understanding the original design intentions, so your input would be welcome. Do you have any suggestions here? The feature that is causing the problem is very convenient when doing interactive work, but is unsafe for any library code that can't predict attribute names. Was the intention to avoid [ in library code? If so, what is the safe alternative?

Jun 21 '22 07:06 szhorvat

Here's a list of suspicious uses of the indexing operator:

plot.R
174:    v <- V(graph)[mark.groups[[g]]]

indexing.R
312:      lapply(adjacent_vertices(x, i, mode = mode), intersection, V(x)[j])

topology.R
77:  lapply(res, function(.x) V(graph2)[.x + 1])
134:  lapply(res, function(.x) V(graph1)[.x + 1])
187:  if (all.maps) res$maps <- lapply(res$maps, function(x) V(target)[x+1])

Notice that the first two uses are fixed (using .x) but the one on line 187 is not.

structural.properties.R
1678:    if (order)     res$order      <- V(graph)[res$order, na_ok = TRUE]
1679:    if (order.out) res$order.out  <- V(graph)[res$order.out, na_ok = TRUE]

This was grepped with the Silver Searcher as ag '^(?!#\').*(V|E)\$.+\$\\[', and irrelevant matches removed manually.

Jun 21 '22 12:06 szhorvat

I have fixed the above, except the ones in structural.properties.R. There I feel that a cleaner solution would be to introduce an extra argument next to na_ok, something like resolve_attrs = FALSE, and then this would prevent the addition of the attributes into the namespace where the [ operator is evaluated. Would you be okay with this name?

Jun 21 '22 13:06 ntamas

Commit 4ccd00f7 adds a resolve_attrs=... option to vertex / edge indexing so you can turn attribute lookup off in problematic cases.

Jun 21 '22 15:06 ntamas

@ntamas @szhorvat how about a solution in the spirit of rlang::.data? The problem seems similar -- looking-up symbols in the environment of vertex attributes first, then in the global environment, isn't it?

Jun 21 '22 15:06 mbojan

I'm not familiar with rlang::.data -- can you point me in the right direction in the docs to see how it is supposed to work?

Jun 21 '22 16:06 ntamas

Sure. The docs are at https://rdrr.io/cran/rlang/man/dot-data.html It's role (together with .env) is to disambiguate symbols in tidyversian verbs in which arguments can be expressions referring to columns in a data frame being processed but also objects in the global environment. Technically they are evaluated in a temporary environment of the data frame whose parent is, I believe, the calling environment. Wasn't the intention with [.igraph.vs (and friends) similar: allow for expressions in [ ] that refer directly to vertex/edge attributes of the processed graph without the need to use V() or E()? For example:

library(igraph)
data(UKfaculty, package = "igraphdata")
# Simply
s1 <- subgraph(UKfaculty, V(UKfaculty)[Group == 3])
# ... rather than:
s2 <- subgraph(UKfaculty, V(UKfaculty)[V(UKfaculty)$Group == 3])
identical_graphs(s1, s2)

My thought was to have a tidyverse-like solution here: evalute i and j in a temporary environment with vertex/edge attributes whose parent is the calling environment, together with, say, a .graph pronoun to disambiguate. Incidentally this will also address R CMD check complaints about expressions such as V(g)[some_attribute == 1] in package code containing unknown global variable some_attribute.

Jun 21 '22 17:06 mbojan

We could use some R expert advice sometimes ;-)

What happens in this framework when disambiguation is not used? Does it refer to the data by default, even without using .data? Or it's context dependent? Is it technically feasible for igraph to use the .env$foo notation as well without having to depend on rlang and without conflicting with rlang? Sorry about the basic questions, I'm not really an R user myself, I just happened to come across this issue.

Jun 21 '22 18:06 szhorvat

We could use some R expert advice sometimes ;-)

This advice comes with ABSOLUTELY NO WARRANTY ;-)

What happens in this framework when disambiguation is not used? Does it refer to the data by default, even without using .data? Or it's context dependent? Is it technically feasible for igraph to use the .env$foo notation as well without having to depend on rlang and without conflicting with rlang? Sorry about the basic questions, I'm not really an R user myself, I just happened to come across this issue.

The idea is to look for symbols among the attributes first, in the calling environment second. See the example below in which I managed without using rlang.

Here is a proof of concept that modifies igraph:::[.igraph.vs() in the above flavour and then patches dfs() to use it in order to avoid the error from issue description.

1 Patched `dfs()`

I modified two lines marked with comments in which I use .env:

my_dfs <- function (graph, root, mode = c("out", "in", "all", "total"), 
          unreachable = TRUE, order = TRUE, order.out = FALSE, father = FALSE, 
          dist = FALSE, in.callback = NULL, out.callback = NULL, extra = NULL, 
          rho = parent.frame(), neimode) 
{
  if (!is_igraph(graph)) {
    stop("Not a graph object")
  }
  if (!missing(neimode)) {
    warning("Argument `neimode' is deprecated; use `mode' instead")
    if (missing(mode)) {
      mode <- neimode
    }
  }
  root <- as.igraph.vs(graph, root) - 1
  mode <- switch(igraph.match.arg(mode), out = 1, `in` = 2, 
                 all = 3, total = 3)
  unreachable <- as.logical(unreachable)
  if (!is.null(in.callback)) {
    in.callback <- as.function(in.callback)
  }
  if (!is.null(out.callback)) {
    out.callback <- as.function(out.callback)
  }
  on.exit(.Call(C_R_igraph_finalizer))
  res <- .Call(C_R_igraph_dfs, graph, root, mode, unreachable, 
               as.logical(order), as.logical(order.out), as.logical(father), 
               as.logical(dist), in.callback, out.callback, extra, 
               rho)
  res$neimode <- res$mode
  if (order) 
    res$order <- res$order + 1
  if (order.out) 
    res$order.out <- res$order.out + 1
  if (father) 
    res$father <- res$father + 1
  if (igraph_opt("return.vs.es")) {
    if (order) 
      res$order <- V(graph)[.env$res$order, na_ok = TRUE]   # Use `.env` to disambiguate `res`
    if (order.out) 
      res$order.out <- V(graph)[.env$res$order.out, na_ok = TRUE]  # And here again
    if (father) 
      res$father <- create_vs(graph, res$father, na_ok = TRUE)
  }
  if (igraph_opt("add.vertex.names") && is_named(graph)) {
    if (father) 
      names(res$father) <- V(graph)$name
    if (dist) 
      names(res$dist) <- V(graph)$name
  }
  res
}

The above needs to be substituted in igraph namespace

environment(my_dfs) <- asNamespace("igraph")
assignInNamespace("dfs", my_dfs, "igraph")

2 Modify `igraph:::[.igraph.vs()`

I use trace() because it was easier to experiment with (no need to reload namespace over and over). What it does is essentially just after first line with the call to lazy_dots() insert a for loop to process the args list by substituting the environments associated with each argument with a an environment that

contains the vertex attributes
contains the original one as .env
has a reference to itself as .graph
has the original one as parent

trace(
  igraph:::'[.igraph.vs',
  tracer = quote({
    e <- as.environment(vertex.attributes(igraph:::get_vs_graph(x)))
    for(i in seq(along = args)) {
      e$.env <- args[[i]]$env
      e$.graph <- e
      parent.env(e) <- args[[i]]$env
      args[[i]]$env <- e
    }
  }),
  at = 3
  )

3 Test it

And I'm getting proper result:

g<-make_ring(5)
r1 <- dfs(g, 1) # works
## Tracing `[.igraph.vs`(V(graph), .env$res$order, na_ok = TRUE) step 3 
V(g)$res <- 1:5
r2 <- dfs(g, 1) # was broken
## Tracing `[.igraph.vs`(V(graph), .env$res$order, na_ok = TRUE) step 3 
# Had error
# Error in res$order : $ operator is invalid for atomic vectors
all.equal(r1, r2)
## [1] TRUE

Jun 21 '22 21:06 mbojan

Nice, I like the idea of the disambiguation prefixes now that I learned about them! I still need to read up a bit on R environments and stuff to get an understanding of it, but I like the idea of where it's going. I'll definitely keep this issue open and consider it for 1.4.0.

Jun 22 '22 08:06 ntamas

And if I understand correctly, this approach would not introduce any conflict with the .env symbol of the rlang package, right? It is indeed a nice solution.

Jun 22 '22 10:06 szhorvat

And if I understand correctly, this approach would not introduce any conflict with the .env symbol of the rlang package, right? It is indeed a nice solution.

That would have to be tested. I guess .graph and .env will have to be declared as globalVariables() in order not to trigger the "unknown variable" warninng during R CMD check (just like the .nei() and other similar functions in igraph). This may trigger name conflict message with rlang, but I'm not 100% sure at this moment whether it will be serious (i.e. break anything) because if used only in [] subsetting .graph.vs or .graph.es R should always find "ours" first...

Jun 22 '22 11:06 mbojan

In the docs you linked, the "Where does .data live?" seems to suggest that this should be fine, and that it's sufficient to declare it using globalVariables() but not necessary to use the rlang package.

Jun 22 '22 11:06 szhorvat

Yep...

Jun 22 '22 11:06 mbojan

@ntamas If we're going with the disambiguation solution, do you want to undo the addition of resolve_attrs?

Jun 23 '22 09:06 szhorvat

Yes, if we decide to go down this road, then resolve_attrs is not necessary. I'll experiment with this later today.

Jun 23 '22 09:06 ntamas

Okay, so if I understod correctly how it's supposed to work, commit 7b2aa7b2 does the trick and it is indeed nicer this way.

Jun 27 '22 17:06 ntamas

Does this go in 1.3.3, or does CRAN disallow new features in point releases?

Jun 27 '22 20:06 szhorvat

Also, is it now okay to use UTF-8 for the R docs? I noticed the ł character. If yes, we can update some other stuff as well.

Jun 27 '22 20:06 szhorvat

Ah, actually, I don't know. I wanted to be pedantic and get all the letters right, but maybe CRAN will complain. Shall we attempt a submission and then check whether it passes? :)

Also, I think it can go in 1.3.3; there is no strict requirement to follow semver, that's just a nice thing to have. We can always consider this as a bugfix ;)

Jun 27 '22 21:06 ntamas

Thanks for the 'ł' :) I think it is enough to set Encoding: UTF-8 in DESCRIPTION. There is also @encoding in Roxygen. In the worst case I'll be happy with a substitute 'l'.

Jun 27 '22 21:06 mbojan

It seems to me that there's nothing left to be done here. Re-open and clarify if you disagree.

Sep 25 '22 08:09 szhorvat

Indexing edge/vertex sequences is prone to name conflict

1 Patched dfs()

2 Modify igraph:::[.igraph.vs()

3 Test it

1 Patched `dfs()`

2 Modify `igraph:::[.igraph.vs()`