graph_from_adj_list doesn't preserve self-edges
Describe the bug When creating a graph from an adjacency list, the argument mode = "total" results in a graph that does not faithfully preserve self-edges from the adjacency list.
To reproduce
adjlist = list(c(1,2,3), c(1,2,3), c(1,2,3), 4, 5, 6)
ig = igraph::graph_from_adj_list(adjlist, mode = "total")
igraph::neighbors(ig, 1)# This is 1 1 1 1 1 1 2 3, should be 1 2 3
igraph::neighbors(ig, 2) # This is 1 3, should be 1 2 3
Version information
- R/igraph version: 1.5.1
- R version: 4.3.1
- Operating system: Mac OS Ventura 13.2.1
> sessionInfo()
R version 4.3.1 (2023-06-16)
Platform: aarch64-apple-darwin20 (64-bit)
Running under: macOS Ventura 13.2.1
Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRlapack.dylib; LAPACK version 3.11.0
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
time zone: America/Los_Angeles
tzcode source: internal
attached base packages:
[1] stats graphics grDevices utils datasets methods base
loaded via a namespace (and not attached):
[1] compiler_4.3.1 magrittr_2.0.3 cli_3.6.1 tools_4.3.1 igraph_1.5.1 rstudioapi_0.15.0 rlang_1.1.1 pkgconfig_2.0.3
>
First of all, let me point out that your adjacency list is invalid. By default, graph_from_adj_list() assumes that undirected edges are present twice in the adjlist, once in each direction. This applies to self-loops as well. The correct adjlist would be:
adjlist <- list(c(1,1,2,3), c(1,2,2,3), c(1,2,3,3), c(4,4), c(5,5), c(6,6))
That said, the function should report an error instead of returning an invalid result. We'll look into implementing this error check.
Also, passing duplicate=FALSE as an argument to graph_from_adj_list() also produces a correct graph. So it's really the error check that's missing here.
Also, passing duplicate=FALSE as an argument to graph_from_adj_list() also produces a correct graph.
Well, not quite, as it will result in duplicated non-loop edges.
Fixing this will require some careful thought ... We probably need separate duplication control for loop and non-loop edges. In fact I think this makes very little sense for non-loop edges. If adjacency is indicated in one direction only for non-loop edges, then this is a directed adjlist, and one simply can't create an undirected graph from it. No need for a flag to hack around that, just create the directed graph, then convert.
For this particular graph, a workaround that can be used now is something like
as.undirected(graph_from_adj_list(adjlist, mode='out', duplicate = F), mode='mutual')
I think this is analogous to the situation with igraph_adjacency(), i.e. converting an adjacency matrix to a graph. I propose the following for 0.11:
- Remove the
duplicateparameter. Its usefulness for non-loop edges is dubious, and people can always do a conversion going through a directed graph first. - If using
IGRAPH_ALLyet some non-loop edges are not present in both direction, throw an error. - Add a standard
loopsparameter withNO_LOOPS,LOOPS_ONCEandLOOPS_TWICE, with the default beingLOOPS_TWICEfor consistent roundtripping through an adjlist representation. In directed mode,LOOPS_TWICEis not treated any differently fromLOOPS_ONCE, just like in other functions. This allowsLOOPS_TWICEto stay the default without having to do special casing.
For the record: the function corresponding to the graph_from_adj_list() function in R is igraph_adjlist() in the C core. I don't really like the name of this function either because we also have an igraph_adjlist_t data type with an igraph_adjlist_init() function and it's real easy to confuse the two.
Furthermore, igraph_adjlist() in the C core was originally meant to be able to convert an igraph_adjlist_t (which you could only create from another graph at that time) back to an igraph_t graph after slight modifications, and it was always implicitly assumed that the user was knowing what they are doing at the low level. igraph_adjlist() also takes an igraph_adjlist_t as an input and not a generic igraph_vector_int_list_t, which also didn't exist at that time.
So, my proposal would be as follows:
- Keep
igraph_adjlist()as is in the C core, but let's deprecate it and mark it for removal in 0.12 or so. - In parallel, create an
igraph_adjacency_listfunction, spelled out nicely (just like we do everywhere else apart from the sort-of-internaligraph_adjlist_tdata type), and make it accept anigraph_vector_int_list_t. This way the conversion from R would also be easier (I guess we'll have generic conversion functions from R lists of integers toigraph_vector_int_list_t), and we can implement the semantics that @szhorvat proposed above. - In the few places where we use
igraph_adjlist()in the C core and to cater for users who happened to use this function from C, we need to provide a replacement. But actually, now that we haveigraph_vector_int_list_t, we should probably refactorigraph_adjlist_tto contain anigraph_vector_int_list_tinternally, and then a replacement forigraph_adjlist()would be a trivial function that extracts the internaligraph_vector_int_list_tfrom anigraph_adjlist_tand feeds it through the newigraph_adjacency_list()function.
Thoughts?