Indexing edge/vertex sequences is prone to name conflict
Describe the bug
When indexing into edge or vertex sequences, attribute names are interpreted specially, as documented: https://igraph.org/r/html/latest/igraph-es-indexing.html and https://igraph.org/r/html/latest/igraph-vs-indexing.html
E.g. if a graph has a
namevertex attribute, thenV(g)[name == "foo"]is equivalent toV(g)[V(g)$name == "foo"].
However, this makes it unsafe to use any variables within the indexing operator in library code. Library code should work with any graph, with any attribute names. There is always a risk that variable name will conflict with an attribute name.
This happened here: https://github.com/igraph/rigraph/issues/154
And we can find other susceptible uses of the [ by searching the source code. For example:
> g<-make_ring(5)
> g[[1,2]]
[[1]]
+ 1/5 vertex, from 302bf86:
[1] 2
> V(g)$j <- LETTERS[1:5]
> g[[1,2]]
Error in simple_vs_index(x, ii, na_ok) : Unknown vertex selected
This is because j happens to be used in the implementation of [[.igraph.
Another example:
> g<-make_ring(5)
> dfs(g, 1) # works
> V(g)$res <- 1:5
> dfs(g, 1) # now it's broken
Error in res$order : $ operator is invalid for atomic vectors
Version information
1.3.2
@gaborcsardi This can't be dealt with without understanding the original design intentions, so your input would be welcome. Do you have any suggestions here? The feature that is causing the problem is very convenient when doing interactive work, but is unsafe for any library code that can't predict attribute names. Was the intention to avoid [ in library code? If so, what is the safe alternative?
Here's a list of suspicious uses of the indexing operator:
plot.R
174: v <- V(graph)[mark.groups[[g]]]
indexing.R
312: lapply(adjacent_vertices(x, i, mode = mode), intersection, V(x)[j])
topology.R
77: lapply(res, function(.x) V(graph2)[.x + 1])
134: lapply(res, function(.x) V(graph1)[.x + 1])
187: if (all.maps) res$maps <- lapply(res$maps, function(x) V(target)[x+1])
Notice that the first two uses are fixed (using .x) but the one on line 187 is not.
structural.properties.R
1678: if (order) res$order <- V(graph)[res$order, na_ok = TRUE]
1679: if (order.out) res$order.out <- V(graph)[res$order.out, na_ok = TRUE]
This was grepped with the Silver Searcher as ag '^(?!#\').*(V|E)\\(.+\\)\\[', and irrelevant matches removed manually.
I have fixed the above, except the ones in structural.properties.R. There I feel that a cleaner solution would be to introduce an extra argument next to na_ok, something like resolve_attrs = FALSE, and then this would prevent the addition of the attributes into the namespace where the [ operator is evaluated. Would you be okay with this name?
Commit 4ccd00f7 adds a resolve_attrs=... option to vertex / edge indexing so you can turn attribute lookup off in problematic cases.
@ntamas @szhorvat how about a solution in the spirit of rlang::.data? The problem seems similar -- looking-up symbols in the environment of vertex attributes first, then in the global environment, isn't it?
I'm not familiar with rlang::.data -- can you point me in the right direction in the docs to see how it is supposed to work?
Sure. The docs are at https://rdrr.io/cran/rlang/man/dot-data.html It's role (together with .env) is to disambiguate symbols in tidyversian verbs in which arguments can be expressions referring to columns in a data frame being processed but also objects in the global environment. Technically they are evaluated in a temporary environment of the data frame whose parent is, I believe, the calling environment. Wasn't the intention with [.igraph.vs (and friends) similar: allow for expressions in [ ] that refer directly to vertex/edge attributes of the processed graph without the need to use V() or E()? For example:
library(igraph)
data(UKfaculty, package = "igraphdata")
# Simply
s1 <- subgraph(UKfaculty, V(UKfaculty)[Group == 3])
# ... rather than:
s2 <- subgraph(UKfaculty, V(UKfaculty)[V(UKfaculty)$Group == 3])
identical_graphs(s1, s2)
My thought was to have a tidyverse-like solution here: evalute i and j in a temporary environment with vertex/edge attributes whose parent is the calling environment, together with, say, a .graph pronoun to disambiguate. Incidentally this will also address R CMD check complaints about expressions such as V(g)[some_attribute == 1] in package code containing unknown global variable some_attribute.
We could use some R expert advice sometimes ;-)
What happens in this framework when disambiguation is not used? Does it refer to the data by default, even without using .data? Or it's context dependent? Is it technically feasible for igraph to use the .env$foo notation as well without having to depend on rlang and without conflicting with rlang? Sorry about the basic questions, I'm not really an R user myself, I just happened to come across this issue.
We could use some R expert advice sometimes ;-)
This advice comes with ABSOLUTELY NO WARRANTY ;-)
What happens in this framework when disambiguation is not used? Does it refer to the data by default, even without using
.data? Or it's context dependent? Is it technically feasible for igraph to use the.env$foonotation as well without having to depend onrlangand without conflicting withrlang? Sorry about the basic questions, I'm not really an R user myself, I just happened to come across this issue.
The idea is to look for symbols among the attributes first, in the calling environment second. See the example below in which I managed without using rlang.
Here is a proof of concept that modifies igraph:::[.igraph.vs() in the above flavour and then patches dfs() to use it in order to avoid the error from issue description.
1 Patched dfs()
I modified two lines marked with comments in which I use .env:
my_dfs <- function (graph, root, mode = c("out", "in", "all", "total"),
unreachable = TRUE, order = TRUE, order.out = FALSE, father = FALSE,
dist = FALSE, in.callback = NULL, out.callback = NULL, extra = NULL,
rho = parent.frame(), neimode)
{
if (!is_igraph(graph)) {
stop("Not a graph object")
}
if (!missing(neimode)) {
warning("Argument `neimode' is deprecated; use `mode' instead")
if (missing(mode)) {
mode <- neimode
}
}
root <- as.igraph.vs(graph, root) - 1
mode <- switch(igraph.match.arg(mode), out = 1, `in` = 2,
all = 3, total = 3)
unreachable <- as.logical(unreachable)
if (!is.null(in.callback)) {
in.callback <- as.function(in.callback)
}
if (!is.null(out.callback)) {
out.callback <- as.function(out.callback)
}
on.exit(.Call(C_R_igraph_finalizer))
res <- .Call(C_R_igraph_dfs, graph, root, mode, unreachable,
as.logical(order), as.logical(order.out), as.logical(father),
as.logical(dist), in.callback, out.callback, extra,
rho)
res$neimode <- res$mode
if (order)
res$order <- res$order + 1
if (order.out)
res$order.out <- res$order.out + 1
if (father)
res$father <- res$father + 1
if (igraph_opt("return.vs.es")) {
if (order)
res$order <- V(graph)[.env$res$order, na_ok = TRUE] # Use `.env` to disambiguate `res`
if (order.out)
res$order.out <- V(graph)[.env$res$order.out, na_ok = TRUE] # And here again
if (father)
res$father <- create_vs(graph, res$father, na_ok = TRUE)
}
if (igraph_opt("add.vertex.names") && is_named(graph)) {
if (father)
names(res$father) <- V(graph)$name
if (dist)
names(res$dist) <- V(graph)$name
}
res
}
The above needs to be substituted in igraph namespace
environment(my_dfs) <- asNamespace("igraph")
assignInNamespace("dfs", my_dfs, "igraph")
2 Modify igraph:::[.igraph.vs()
I use trace() because it was easier to experiment with (no need to reload namespace over and over). What it does is essentially just after first line with the call to lazy_dots() insert a for loop to process the args list by substituting the environments associated with each argument with a an environment that
- contains the vertex attributes
- contains the original one as
.env - has a reference to itself as
.graph - has the original one as parent
trace(
igraph:::'[.igraph.vs',
tracer = quote({
e <- as.environment(vertex.attributes(igraph:::get_vs_graph(x)))
for(i in seq(along = args)) {
e$.env <- args[[i]]$env
e$.graph <- e
parent.env(e) <- args[[i]]$env
args[[i]]$env <- e
}
}),
at = 3
)
3 Test it
And I'm getting proper result:
g<-make_ring(5)
r1 <- dfs(g, 1) # works
## Tracing `[.igraph.vs`(V(graph), .env$res$order, na_ok = TRUE) step 3
V(g)$res <- 1:5
r2 <- dfs(g, 1) # was broken
## Tracing `[.igraph.vs`(V(graph), .env$res$order, na_ok = TRUE) step 3
# Had error
# Error in res$order : $ operator is invalid for atomic vectors
all.equal(r1, r2)
## [1] TRUE
Nice, I like the idea of the disambiguation prefixes now that I learned about them! I still need to read up a bit on R environments and stuff to get an understanding of it, but I like the idea of where it's going. I'll definitely keep this issue open and consider it for 1.4.0.
And if I understand correctly, this approach would not introduce any conflict with the .env symbol of the rlang package, right? It is indeed a nice solution.
And if I understand correctly, this approach would not introduce any conflict with the
.envsymbol of the rlang package, right? It is indeed a nice solution.
That would have to be tested. I guess .graph and .env will have to be declared as globalVariables() in order not to trigger the "unknown variable" warninng during R CMD check (just like the .nei() and other similar functions in igraph). This may trigger name conflict message with rlang, but I'm not 100% sure at this moment whether it will be serious (i.e. break anything) because if used only in [] subsetting .graph.vs or .graph.es R should always find "ours" first...
In the docs you linked, the "Where does .data live?" seems to suggest that this should be fine, and that it's sufficient to declare it using globalVariables() but not necessary to use the rlang package.
Yep...
@ntamas If we're going with the disambiguation solution, do you want to undo the addition of resolve_attrs?
Yes, if we decide to go down this road, then resolve_attrs is not necessary. I'll experiment with this later today.
Okay, so if I understod correctly how it's supposed to work, commit 7b2aa7b2 does the trick and it is indeed nicer this way.
Does this go in 1.3.3, or does CRAN disallow new features in point releases?
Also, is it now okay to use UTF-8 for the R docs? I noticed the ł character. If yes, we can update some other stuff as well.
Ah, actually, I don't know. I wanted to be pedantic and get all the letters right, but maybe CRAN will complain. Shall we attempt a submission and then check whether it passes? :)
Also, I think it can go in 1.3.3; there is no strict requirement to follow semver, that's just a nice thing to have. We can always consider this as a bugfix ;)
Thanks for the 'ł' :) I think it is enough to set Encoding: UTF-8 in DESCRIPTION. There is also @encoding in Roxygen. In the worst case I'll be happy with a substitute 'l'.
It seems to me that there's nothing left to be done here. Re-open and clarify if you disagree.