igraph icon indicating copy to clipboard operation
igraph copied to clipboard

graph_from_adj_list doesn't preserve self-edges

Open drewtnguyen opened this issue 2 years ago • 9 comments

Describe the bug When creating a graph from an adjacency list, the argument mode = "total" results in a graph that does not faithfully preserve self-edges from the adjacency list.

To reproduce

    adjlist = list(c(1,2,3), c(1,2,3), c(1,2,3), 4, 5, 6)
    ig = igraph::graph_from_adj_list(adjlist, mode = "total")
    igraph::neighbors(ig, 1)# This is 1 1 1 1 1 1 2 3, should be 1 2 3
    igraph::neighbors(ig, 2) # This is 1 3, should be 1 2 3

Version information

  • R/igraph version: 1.5.1
  • R version: 4.3.1
  • Operating system: Mac OS Ventura 13.2.1
    > sessionInfo()
    R version 4.3.1 (2023-06-16)
    Platform: aarch64-apple-darwin20 (64-bit)
    Running under: macOS Ventura 13.2.1
    
    Matrix products: default
    BLAS:   /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib 
    LAPACK: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.11.0
    
    locale:
    [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
    
    time zone: America/Los_Angeles
    tzcode source: internal
    
    attached base packages:
    [1] stats     graphics  grDevices utils     datasets  methods   base     
    
    loaded via a namespace (and not attached):
    [1] compiler_4.3.1    magrittr_2.0.3    cli_3.6.1         tools_4.3.1       igraph_1.5.1      rstudioapi_0.15.0 rlang_1.1.1       pkgconfig_2.0.3  
    > 

drewtnguyen avatar Aug 16 '23 18:08 drewtnguyen

First of all, let me point out that your adjacency list is invalid. By default, graph_from_adj_list() assumes that undirected edges are present twice in the adjlist, once in each direction. This applies to self-loops as well. The correct adjlist would be:

adjlist <- list(c(1,1,2,3), c(1,2,2,3), c(1,2,3,3), c(4,4), c(5,5), c(6,6))

That said, the function should report an error instead of returning an invalid result. We'll look into implementing this error check.

szhorvat avatar Aug 16 '23 18:08 szhorvat

Also, passing duplicate=FALSE as an argument to graph_from_adj_list() also produces a correct graph. So it's really the error check that's missing here.

ntamas avatar Aug 16 '23 18:08 ntamas

Also, passing duplicate=FALSE as an argument to graph_from_adj_list() also produces a correct graph.

Well, not quite, as it will result in duplicated non-loop edges.

Fixing this will require some careful thought ... We probably need separate duplication control for loop and non-loop edges. In fact I think this makes very little sense for non-loop edges. If adjacency is indicated in one direction only for non-loop edges, then this is a directed adjlist, and one simply can't create an undirected graph from it. No need for a flag to hack around that, just create the directed graph, then convert.

szhorvat avatar Aug 16 '23 18:08 szhorvat

For this particular graph, a workaround that can be used now is something like

as.undirected(graph_from_adj_list(adjlist, mode='out', duplicate = F), mode='mutual')

szhorvat avatar Aug 16 '23 18:08 szhorvat

I think this is analogous to the situation with igraph_adjacency(), i.e. converting an adjacency matrix to a graph. I propose the following for 0.11:

  • Remove the duplicate parameter. Its usefulness for non-loop edges is dubious, and people can always do a conversion going through a directed graph first.
  • If using IGRAPH_ALL yet some non-loop edges are not present in both direction, throw an error.
  • Add a standard loops parameter with NO_LOOPS, LOOPS_ONCE and LOOPS_TWICE, with the default being LOOPS_TWICE for consistent roundtripping through an adjlist representation. In directed mode, LOOPS_TWICE is not treated any differently from LOOPS_ONCE, just like in other functions. This allowsLOOPS_TWICE to stay the default without having to do special casing.

szhorvat avatar Aug 16 '23 19:08 szhorvat

For the record: the function corresponding to the graph_from_adj_list() function in R is igraph_adjlist() in the C core. I don't really like the name of this function either because we also have an igraph_adjlist_t data type with an igraph_adjlist_init() function and it's real easy to confuse the two.

Furthermore, igraph_adjlist() in the C core was originally meant to be able to convert an igraph_adjlist_t (which you could only create from another graph at that time) back to an igraph_t graph after slight modifications, and it was always implicitly assumed that the user was knowing what they are doing at the low level. igraph_adjlist() also takes an igraph_adjlist_t as an input and not a generic igraph_vector_int_list_t, which also didn't exist at that time.

So, my proposal would be as follows:

  • Keep igraph_adjlist() as is in the C core, but let's deprecate it and mark it for removal in 0.12 or so.
  • In parallel, create an igraph_adjacency_list function, spelled out nicely (just like we do everywhere else apart from the sort-of-internal igraph_adjlist_t data type), and make it accept an igraph_vector_int_list_t. This way the conversion from R would also be easier (I guess we'll have generic conversion functions from R lists of integers to igraph_vector_int_list_t), and we can implement the semantics that @szhorvat proposed above.
  • In the few places where we use igraph_adjlist() in the C core and to cater for users who happened to use this function from C, we need to provide a replacement. But actually, now that we have igraph_vector_int_list_t, we should probably refactor igraph_adjlist_t to contain an igraph_vector_int_list_t internally, and then a replacement for igraph_adjlist() would be a trivial function that extracts the internal igraph_vector_int_list_t from an igraph_adjlist_t and feeds it through the new igraph_adjacency_list() function.

Thoughts?

ntamas avatar Aug 17 '23 12:08 ntamas