mgmt
mgmt copied to clipboard
[LOVE] Implement graph isomorphism
This is an issue for a medium to advanced level golang programmer. It also requires a strong understanding of graph algorithms and the willingness to figure out how to implement a graph isomorphism algorithm which is apparently a very difficult problem. This is an algorithm that determines if two graphs are equivalent or not.
In mgmt we have an internal pgraph (graph) library. We currently have a GraphCmp
algorithm: https://github.com/purpleidea/mgmt/blob/master/pgraph/pgraph.go#L501 which takes a VertexCmp and EdgeCmp method as input, and compares two graphs.
It has some limitations, in that it fails if there are duplicate vertices in either graph. We'd like to replace this with a proper implementation based on a known algorithm. This is only useful for our tests, and isn't currently used in the running production code. As a result this is LOW priority.
This patch should also include some new tests!
Please leave a comment if you're going to work on this patch!
Thanks and happy hacking
Here are a few misc. references btw:
https://en.wikipedia.org/wiki/Graph_isomorphism_problem https://github.com/networkx/networkx/blob/master/networkx/algorithms/isomorphism/isomorphvf2.py https://www3.cs.stonybrook.edu/~algorith/implement/nauty/implement.shtml http://www.dharwadker.org/tevet/isomorphism/ https://github.com/igraph/igraph/tree/master/src/bliss
The idea is to implement something straight forward that works, not to implement the most efficient 100,000 vertex capable monster.
As I said, I'd like to give it a shot. Any additional discussion needed before I start working on this?
@AdnanLFC I think you're good to go! Feel free to ping if you come across any implementation or design questions that you'd like answered.
Thanks!
@AdnanLFC Any chance you're still working on this? Please let us know either way, thanks!
Got pulled away by some other things... :disappointed: However, I might give it another try these days.
I'll take this.
@keur Awesome-- one disclosure: this is a hard problem (you've been warned) and it's not 100% clear what's blocking us. Last I looked, this might be helpful...
https://github.com/purpleidea/mgmt/blob/f342e06ef0d0fc173dcb5320e9a376f14d561868/lang/interpret_test.go#L324
https://github.com/purpleidea/mgmt/blob/f342e06ef0d0fc173dcb5320e9a376f14d561868/lang/lang_test.go#L884
I thought I was stuck there. Have a look and see if you think graph isomorphism is necessary.
My secret: I'm a terrible algorithmist :P Thanks for looking into this!
After some more digging, this is very low priority, since it's only needed for tests at the moment.
Some notes:
-
We can't choose any algorithm that could fail even if that's an infinitesimal probability. It's just not clean.
-
Even if something is NP-complete, doesn't mean that it won't be able to complete the execution in a reasonable amount of time for the size of graphs that we have. Whether our graphs will have 100's, 1000's, 10,000's of vertices, I don't know, but I'm optimistic that we can perform the needed computations fast enough.
-
As you might have already seen, there are multiple DAG's in the code base. (I count 3 atm.) The function graph, and the resource graph are the two main ones that we need to be careful about.
In the resource graph (that the engine runs) we currently allow duplicate resources, but this is a bug. We should consider either error-ing there and/or de-duplicating equivalent resources. This is an issue that can be seen if we use the yaml
front-end eg:
---
graph: mygraph
resources:
file:
- name: file0
path: "/tmp/f0"
content: |
i am f0
state: exists
- name: file0
path: "/tmp/f0"
content: |
i am f1
state: exists
edges: []
This shouldn't work, but it does :/ Badly. They fight. Oops.
This doesn't happen in mcl
code, because it's clever enough to de-duplicate output!
The function graph however can have identical (same .String()
values, different pointers of course) vertices. We then ultimately run interpret
on that AST, and get the resource graph. At this stage we ensure that we don't have duplicate resources.
So it actually seems that GraphCmp
is very useful for tests (where we compare a function graph with identical vertices) but not actually used anywhere in the production code. The most similar function to GraphCmp
is GraphSync
, which fortunately doesn't need to deal with duplicate vertices since this is only used in the engine at the moment where this is supposed to be unique :)
So this patch is useful, but only for tests atm.