A proposal of an all_simple_paths function implementation
This PR is an implementation proposal for a function that searches all simple paths. See also related issues in #1521
I'm not familiar with the coding manner of this project, so any feedback or suggestion is welcome.
I would appreciate it if you could check it out.
Overview
In this implementation, I followed the same strategy implemented in NetworkX (Python's Network Analysis package).
Here, I provide all_simple_paths() function that is intended to be a public interface.
This function returns an iterator that generates simple paths (that means no repeated nodes).
By the word of Iterator, I mean objects that comply with the iteration protocol discussed in this document https://docs.julialang.org/en/v1/manual/interfaces/ .
In this PR I provide the following iteration methods:
Required methods
- iterate(iter)
- iterate(iter, state)
Some important optional methods
- collect
- length
Note that the main logic of path search exists in the iterate() function.
The all_simple_paths() is intended to be the only entry point. Other struct and functions are intended to be used implicitly or internally.
Example
This is a simple example.
julia> using LightGraphs
julia> g = complete_graph(4)
julia> collect(all_simple_paths(g, 1, 4))
5-element Array{Array{Int64,1},1}:
[1, 4]
[1, 3, 4]
[1, 3, 2, 4]
[1, 2, 4]
[1, 2, 3, 4]
Test cases
I picked up related test cases from NetowrkX's and rewrote them to fit the LightGraphs' usage.
Performance
I compared this implementation and the original NetworkX's implementation through PyCall package.
I observed approximately 10x faster results.
It seems not so bad, but when I apply the same NetworkX method in a python environment directory, I got 3-4x faster results than this implementation.
I have little experience with Julia performance tuning and have no idea the origin of the difference.
So, I am happy to hear any advice or idea to improve performance.
The followings are actual codes used for performance checks.
using LightGraphs
using PyCall
function countpath(iter)
c = 0
for p in iter
c += 1
end
return c
end
niter = 10
graphsize = 10
g_jl = complete_graph(graphsize)
@time (for i in 1:niter; countpath(all_simple_paths(g_jl, 1, [graphsize-1])); end)
nx = pyimport_conda("networkx", "networkx")
g_py = nx.complete_graph(graphsize)
@time (for i in 1:niter; countpath(nx.algorithms.simple_paths.all_simple_paths(g_py, source=1, target=[graphsize-1])); end)
I got the following result.
1.725490 seconds (16.08 M allocations: 1.772 GiB, 9.16% gc time)
23.670451 seconds (44.40 M allocations: 1.560 GiB, 4.19% gc time)
The followings are python codes that can be run on IPython.
import networkx as nx
niter = 10
graphsize = 10
G = nx.complete_graph(graphsize)
%time len([p for p in nx.algorithms.simple_paths.all_simple_paths(G, 1, graphsize-1, niter)])
The result is followings.
CPU times: user 592 ms, sys: 8.43 ms, total: 601 ms
Wall time: 601 ms
Codecov Report
Merging #1540 (2f13330) into master (05c80bb) will increase coverage by
0.00%. The diff coverage is100.00%.
:exclamation: Current head 2f13330 differs from pull request most recent head ead91c4. Consider uploading reports for the commit ead91c4 to get more accurate results
@@ Coverage Diff @@
## master #1540 +/- ##
=======================================
Coverage 99.44% 99.44%
=======================================
Files 106 107 +1
Lines 5551 5604 +53
=======================================
+ Hits 5520 5573 +53
Misses 31 31
I'd be interested in this feature. What is the state here? Are there plans to merge it? Or should this go into a standalone package?
@lassepe Thank you for your comment. I’m hoping that this PR will be merged.
@sbromberger If there are any problems or missing information for you or members to start your review of this PR, please feel free to point them out.
I think that as long as we are iterating over the paths and not realizing them all in memory, it is a fine addition. I draw that distinction because users can CTRL-C interrupt things that take too long, but things that use too much space hit swap and make the system unresponsive. The docstring can note that the number of paths vastly exceeds the number edges and that they should be careful when using the function on even medium sized graphs.
I think that as long as we are iterating over the paths and not realizing them all in memory, it is a fine addition. I draw that distinction because users can CTRL-C interrupt things that take too long, but things that use too much space hit swap and make the system unresponsive.
I think this is a good compromise, but I note that this implementation appears to be |V|^2 + c|V| in memory. That seems to be a bit too much. On the other hand, we do have some functions that are |V|^2 (I avoid using these since they break on large graphs).
Do you have a specific recommendation with respect to including this code in base?
I think including it with that complexity warning in the docstring is the right move. Since the algorithm can change without breaking the interface (when you iterate the iterator you get all the paths) then I think the memory use can get fixed if people use this and want to scale it up further. The set of paths is really big so I imagine that scaling won't be a priority unless someone's research needs it.
I am in favor of inclusion with appropriate warnings.
@sbromberger @jpfairbanks Many thanks for your comment and discussion!
I have fixed the code.
I think this is a good compromise, but I note that this implementation appears to be |V|^2 + c|V| in memory. That seems to be a bit too much. On the other hand, we do have some functions that are |V|^2 (I avoid using these since they break on large graphs).
It's just an idea at the moment, but I'm thinking that replacing the vectors in the stack with iterator state or something might improve the memory footprint. I will try this issue furthermore.
@sbromberger I rewrote the stack usage to improve memory efficiency.
Now, each vector of Stack{Vector{T}} has only two elements (parent node and index of children).
I still use a vector as a stack element to make the element mutable.
I think the total memory usage will be O(|V|).
@sbromberger any news about merging this functionality into LightGraphs? It still could be very useful for small graphs. For example, recently I started to rewrite Dagitty R package for causal graph analisys to Julia and there are lots of dependencies here on all paths enumeration. It would be great to have this logic in LightGraphs instead of reimplementing the wheel in other packages.