LightGraphs.jl icon indicating copy to clipboard operation
LightGraphs.jl copied to clipboard

A proposal of an all_simple_paths function implementation

Open i-aki-y opened this issue 4 years ago • 9 comments

This PR is an implementation proposal for a function that searches all simple paths. See also related issues in #1521

I'm not familiar with the coding manner of this project, so any feedback or suggestion is welcome.

I would appreciate it if you could check it out.

Overview

In this implementation, I followed the same strategy implemented in NetworkX (Python's Network Analysis package). Here, I provide all_simple_paths() function that is intended to be a public interface. This function returns an iterator that generates simple paths (that means no repeated nodes). By the word of Iterator, I mean objects that comply with the iteration protocol discussed in this document https://docs.julialang.org/en/v1/manual/interfaces/ .

In this PR I provide the following iteration methods:

Required methods

  • iterate(iter)
  • iterate(iter, state)

Some important optional methods

  • collect
  • length

Note that the main logic of path search exists in the iterate() function. The all_simple_paths() is intended to be the only entry point. Other struct and functions are intended to be used implicitly or internally.

Example

This is a simple example.

julia> using LightGraphs
julia> g = complete_graph(4)
julia> collect(all_simple_paths(g, 1, 4))
5-element Array{Array{Int64,1},1}:
 [1, 4]
 [1, 3, 4]
 [1, 3, 2, 4]
 [1, 2, 4]
 [1, 2, 3, 4]

Test cases

I picked up related test cases from NetowrkX's and rewrote them to fit the LightGraphs' usage.

Performance

I compared this implementation and the original NetworkX's implementation through PyCall package. I observed approximately 10x faster results. It seems not so bad, but when I apply the same NetworkX method in a python environment directory, I got 3-4x faster results than this implementation. I have little experience with Julia performance tuning and have no idea the origin of the difference. So, I am happy to hear any advice or idea to improve performance.

The followings are actual codes used for performance checks.

using LightGraphs
using PyCall

function countpath(iter)
    c = 0
    for p in iter
        c += 1
    end
    return c
end

niter = 10
graphsize = 10
g_jl = complete_graph(graphsize)
@time (for i in 1:niter; countpath(all_simple_paths(g_jl, 1, [graphsize-1])); end)

nx = pyimport_conda("networkx", "networkx")
g_py = nx.complete_graph(graphsize)

@time (for i in 1:niter; countpath(nx.algorithms.simple_paths.all_simple_paths(g_py, source=1, target=[graphsize-1])); end)

I got the following result.

  1.725490 seconds (16.08 M allocations: 1.772 GiB, 9.16% gc time)
 23.670451 seconds (44.40 M allocations: 1.560 GiB, 4.19% gc time)

The followings are python codes that can be run on IPython.

import networkx as nx

niter = 10
graphsize = 10
G = nx.complete_graph(graphsize)
%time len([p for p in nx.algorithms.simple_paths.all_simple_paths(G, 1, graphsize-1, niter)])

The result is followings.

CPU times: user 592 ms, sys: 8.43 ms, total: 601 ms
Wall time: 601 ms

i-aki-y avatar Feb 20 '21 05:02 i-aki-y

Codecov Report

Merging #1540 (2f13330) into master (05c80bb) will increase coverage by 0.00%. The diff coverage is 100.00%.

:exclamation: Current head 2f13330 differs from pull request most recent head ead91c4. Consider uploading reports for the commit ead91c4 to get more accurate results

@@           Coverage Diff           @@
##           master    #1540   +/-   ##
=======================================
  Coverage   99.44%   99.44%           
=======================================
  Files         106      107    +1     
  Lines        5551     5604   +53     
=======================================
+ Hits         5520     5573   +53     
  Misses         31       31           

codecov[bot] avatar Feb 20 '21 05:02 codecov[bot]

I'd be interested in this feature. What is the state here? Are there plans to merge it? Or should this go into a standalone package?

lassepe avatar May 06 '21 09:05 lassepe

@lassepe Thank you for your comment. I’m hoping that this PR will be merged.

@sbromberger If there are any problems or missing information for you or members to start your review of this PR, please feel free to point them out.

i-aki-y avatar May 09 '21 04:05 i-aki-y

I think that as long as we are iterating over the paths and not realizing them all in memory, it is a fine addition. I draw that distinction because users can CTRL-C interrupt things that take too long, but things that use too much space hit swap and make the system unresponsive. The docstring can note that the number of paths vastly exceeds the number edges and that they should be careful when using the function on even medium sized graphs.

jpfairbanks avatar May 13 '21 11:05 jpfairbanks

I think that as long as we are iterating over the paths and not realizing them all in memory, it is a fine addition. I draw that distinction because users can CTRL-C interrupt things that take too long, but things that use too much space hit swap and make the system unresponsive.

I think this is a good compromise, but I note that this implementation appears to be |V|^2 + c|V| in memory. That seems to be a bit too much. On the other hand, we do have some functions that are |V|^2 (I avoid using these since they break on large graphs).

Do you have a specific recommendation with respect to including this code in base?

sbromberger avatar May 13 '21 12:05 sbromberger

I think including it with that complexity warning in the docstring is the right move. Since the algorithm can change without breaking the interface (when you iterate the iterator you get all the paths) then I think the memory use can get fixed if people use this and want to scale it up further. The set of paths is really big so I imagine that scaling won't be a priority unless someone's research needs it.

I am in favor of inclusion with appropriate warnings.

jpfairbanks avatar May 13 '21 15:05 jpfairbanks

@sbromberger @jpfairbanks Many thanks for your comment and discussion!

I have fixed the code.

I think this is a good compromise, but I note that this implementation appears to be |V|^2 + c|V| in memory. That seems to be a bit too much. On the other hand, we do have some functions that are |V|^2 (I avoid using these since they break on large graphs).

It's just an idea at the moment, but I'm thinking that replacing the vectors in the stack with iterator state or something might improve the memory footprint. I will try this issue furthermore.

i-aki-y avatar May 17 '21 01:05 i-aki-y

@sbromberger I rewrote the stack usage to improve memory efficiency. Now, each vector of Stack{Vector{T}} has only two elements (parent node and index of children). I still use a vector as a stack element to make the element mutable. I think the total memory usage will be O(|V|).

i-aki-y avatar May 18 '21 00:05 i-aki-y

@sbromberger any news about merging this functionality into LightGraphs? It still could be very useful for small graphs. For example, recently I started to rewrite Dagitty R package for causal graph analisys to Julia and there are lots of dependencies here on all paths enumeration. It would be great to have this logic in LightGraphs instead of reimplementing the wheel in other packages.

Shmuma avatar Aug 21 '21 15:08 Shmuma