rigraph icon indicating copy to clipboard operation
rigraph copied to clipboard

Memory leak when using shortest_paths

Open matt-wirtz opened this issue 6 years ago • 5 comments

Hello,

there is a memory leak when using shortest_paths algorithm.

library(igraph)
iterations <- 1000
g <-  make_tree(n = 5000, children = 3, mode = "out")

for (i in 1:length(V(g))) { V(g)[i]$name <- ifelse(i %% 10 == 0, "home", "away")}
for (i in 1:length(E(g))) { E(g)[i]$weight <- as.integer(runif(1, 1, 10)) }

print(sum(gc()[,4]))
for (i in 1:iterations) { 
  if (i %% 100 == 0) print(sum(gc()[,4]))
  sp <- shortest_paths(g, from = 3, to = V(g)[name == "home"], 
                   mode="out", output="vpath", weights = E(g)$weight) 
  }

The memory used by the R session is displayed after every 100 calls to shortest_paths. It increases more or less steadily. When using smaller graphs it takes more iterations to make the increase obvious. I tested it on ArchLinux and Windows using igraph 1.1.2.

I came across this issue in a lager simulation and there I ended up with an R session using 3GB while all objects within the R session just needed 5MB. After drilling down I found out that the calls to shortest_paths caused the memory leaks.

matt-wirtz avatar Jul 15 '18 17:07 matt-wirtz

Output in the dev version (1.3):

[1] 312.2
[1] 312.2
[1] 328.3
[1] 328.3
[1] 328.4
[1] 347.7
[1] 347.7
[1] 347.7
[1] 422.1
[1] 422.1

There are also a very large number of warnings. I wonder if it has to do with these.

szhorvat avatar Jan 10 '22 14:01 szhorvat

Interestingly enough, the leak is not there if I set igraph_options(return.vs.es=FALSE) - which means that the leak is not in the shortest path routine where I initially suspected it to be, but in create_vs().

ntamas avatar Jan 18 '22 10:01 ntamas

An even simpler way to trigger the issue (no graphs involved):

library(igraph)

iterations <- 500000

print(sum(gc()[,4]))
env <- new.env()
for (i in 1:iterations) {
  if (i %% 50000 == 0) {
    print(sum(gc()[,4]))
  }
  igraph:::make_weak_ref(env, NULL)
}

print("Calling GC after loop")
sum(gc()[,4])

print("Removing env from workspace")
rm(env)

print("Calling GC after removal")
sum(gc()[,4])

print("Exiting after last GC")

I'm just starting to understand weakrefs in R; it looks like igraph:::make_weak_ref(env, ...) (which is an R wrapper for the low-level R_MakeWeakRef function) returns an object that somehow remains alive even if I don't store it anywhere, and its memory is reclaimed only when env goes out of scope (or after calling rm(env)). The consequence is that calling V(g) many times in a loop allocates lots of these objects, and these are reclaimed only when g is destroyed. That's why we see an increasing memory consumption in the example above. I don't know whether this is by design or not; I need to investigate this more.

ntamas avatar Jan 18 '22 11:01 ntamas

Note to self: each igraph object contains a hidden environment that is used to record mutable meta-information about the graph without changing its identity. igraph vertex sequence objects (i.e. ones created with V(g)) contain a weak reference that points back to the environment object hidden in its parent graph. In turn, this environment object contains a key named me, which points back to the graph itself. This is how the vertex sequence object can get hold of the graph it refers to (it follows the weakref, and fetches the me key from the environment it points to). The idea is that when the underlying graph is removed from the workspace, its environment loses its last strong reference, which in turn invalidates the weak reference in the vertex sequence so the vertex sequence becomes "orphaned" automatically.

This seems to be the only place where weakrefs are used in the code, and this is the cause of the increase in memory usage. When V(g) is invoked, a new weakref is created, and this weakref is somehow not collected by the garbage collector when V(g) is discarded. The memory held by the weakref is released only when the key of the weakref is invalidated by the GC, even if no one needs the weakref any more.

Replacing the weakref with a strong reference resolves the leak, but this is only a workaround, not a solution for the underlying problem.


Update: the weakref behaviour seems to be documented.

A WeakRef object with a reachable key will not be garbage collected even if the WeakRef object is not itself reachable.

ntamas avatar Jan 19 '22 10:01 ntamas

Retargeting this to a later milestone as it's unlikely that we can sort this out before the release of 1.3.0 -- it would require us to find a replacement for the current weakref trickery, and this does not seem trivial.

ntamas avatar Mar 25 '22 16:03 ntamas

Is there any way for the user to fix this? My 1 TB RAM machine ran out of memory after a few loop iterations.

cdueben avatar Nov 15 '22 11:11 cdueben

A potential workaround is mentioned in https://github.com/igraph/rigraph/issues/288#issuecomment-1015296793

szhorvat avatar Nov 15 '22 11:11 szhorvat

Thanks. Though, after playing around with that suggestion for a few days, I have to report that it does not work (anymore). Maybe it used to work with previous versions of R, igraph, or its dependencies, but now the problem also occurs with igraph_options(return.vs.es = FALSE). Do you have any other recommendation?

cdueben avatar Nov 17 '22 10:11 cdueben

Can you show a complete minimal example that reproduces the issue? Maybe what you observe is not even the same problem. Run the example in a clean session and include the output of sessionInfo().

szhorvat avatar Nov 17 '22 10:11 szhorvat

I have built a series of minimal working examples since you posted your comment, but none of them reproduces the problem. While my original code keeps inflating RAM usage with each loop iteration, the examples' RAM consumption is much flatter.

Consider the following example as a representation of the basic structure:

example_function <- function(rst) {
  igraph::igraph.options(return.vs.es = FALSE)
  nc <- terra::ncell(rst)
  rst <- terra::adjacent(rst, 1:nc, "queen", TRUE)
  rst <- igraph::graph_from_edgelist(rst, directed = FALSE) |> 
    igraph::set_edge_attr("weight", value = runif(NROW(rst)))
  cl <- parallel::makeForkCluster(5)
  parallel::clusterEvalQ(cl, igraph::igraph.options(return.vs.es = FALSE))
  destinations <- sample(2:(nc - 10000L), 10L)
  nodes_rm <- lapply(1:30, function(x) sample.int(nc, 10000L))
  loop_over_graphs <- function(nr, i) {
    rst_i <- igraph::delete_vertices(rst, nr)
    dists <- lapply(destinations, function(d) {
      s <- igraph::shortest_paths(rst, 1L, d, output = "both", algorithm = "dijkstra")
      return(list(vapply(s$epath, function(S) sum(igraph::edge_attr(rst, "weight", S)), numeric(1L), USE.NAMES = FALSE), s$vpath))
    })
    list(do.call(c, lapply(dists, `[[`, 1L)), do.call(c, lapply(dists, `[[`, 2L))) |> 
      saveRDS(paste0(i, ".rds"))
    return(NULL)
  }
  parallel::clusterMap(cl, loop_over_graphs, nodes_rm, 1:30, SIMPLIFY = FALSE, USE.NAMES = FALSE)
  parallel::stopCluster(cl)
  return(NULL)
}
r <- terra::rast(nrows = 5000L, ncols = 8000L, vals = 1L)
example_function(r)

A function generates a graph from a grid and loops over it. In each iteration, it deletes vertices and computes shortest paths and their lengths on the modified graphs. This example is overly simplified. My original code is implemented as a package and e.g. omits the igraph::igraph.options(return.vs.es = FALSE) option in the first line of this example, as I set it through .onLoad. And despite the class of the returned values adhering to the igraph::igraph.options(return.vs.es = FALSE) option, my original code keeps inflating RAM consumption. Somehow, it keeps data in RAM that is not tied to any objects in the R environment at the time - neither objects in the master nor the worker processes.

As igraph does, to my knowledge, not allow for parallelization at the C level, I need to parallelize execution at the R level. The problem occurs irrespective of using PSOCK or FORK clusters. I tested on openSUSE Leap 15.3, Rocky Linux 8.6, and Windows 11, with Windows examples using PSOCK only. R versions are R 4.1.1, R 4.2.1, and R 4.2.2. All three machines employ igraph 1.3.5.

Sorry for not providing an example reproducing the problem. I hope that you might nonetheless have a clue of what might be the problem. Thanks.

cdueben avatar Dec 06 '22 14:12 cdueben

The warnings come from the fact that some nodes are not reachable. Suppressing the warnings doesn't seem to change the behavior.

We really need a reproducible example. I see the following with the current main with R 4.2.3:

library(igraph)

iterations <- 1500
g <- make_tree(n = 5000, children = 3, mode = "out")

# Enabling this fixes the problem
# igraph_options(return.vs.es=FALSE)

V(g)$name <- rep(c(rep("away", 9), "home"), length.out = length(V(g)))
E(g)$weight <- as.integer(runif(length(E(g)), 1, 10))

print(sum(gc()[, 4]))
#> [1] 140.5
for (i in seq_len(iterations)) {
  if (i %% 100 == 0) {
    print(system.time(print(sum(gc()[, 4]))))
  }
  suppressWarnings(
    shortest_paths(
      g,
      from = 3,
      to = V(g)[name == "home"],
      mode = "out",
      output = "vpath",
      weights = E(g)$weight
    )
  )
}
#> [1] 140.5
#>    user  system elapsed 
#>   0.031   0.001   0.031 
#> [1] 140.5
#>    user  system elapsed 
#>   0.037   0.000   0.037 
#> [1] 140.5
#>    user  system elapsed 
#>   0.045   0.000   0.044 
#> [1] 163.4
#>    user  system elapsed 
#>   0.052   0.001   0.053 
#> [1] 163.4
#>    user  system elapsed 
#>    0.06    0.00    0.06 
#> [1] 163.4
#>    user  system elapsed 
#>   0.072   0.002   0.074 
#> [1] 163.4
#>    user  system elapsed 
#>   0.083   0.000   0.084 
#> [1] 163.4
#>    user  system elapsed 
#>   0.098   0.000   0.098 
#> [1] 185.4
#>    user  system elapsed 
#>   0.112   0.001   0.113 
#> [1] 185.4
#>    user  system elapsed 
#>   0.124   0.000   0.124 
#> [1] 185.4
#>    user  system elapsed 
#>   0.138   0.000   0.139 
#> [1] 185.4
#>    user  system elapsed 
#>   0.152   0.001   0.154 
#> [1] 185.4
#>    user  system elapsed 
#>   0.165   0.001   0.166 
#> [1] 198.9
#>    user  system elapsed 
#>   0.191   0.001   0.192 
#> [1] 225.3
#>    user  system elapsed 
#>   0.200   0.002   0.202

Created on 2023-03-31 with reprex v2.0.2

Each gc() call takes longer, an indication for a memory leak indeed.

I don't understand yet why we even need weak references.

krlmlr avatar Mar 31 '23 04:03 krlmlr

My best understanding of the weakrefs is documented here in this comment. If I understand correctly, Gábor did not want a dangling vertex sequence object (i.e. V(g) or a subset of it) to keep the graph itself alive when the graph was already removed from the workspace. Does this help?

ntamas avatar Mar 31 '23 06:03 ntamas