mgmt icon indicating copy to clipboard operation
mgmt copied to clipboard

[LOVE] Implement graph isomorphism

Open purpleidea opened this issue 7 years ago • 8 comments

This is an issue for a medium to advanced level golang programmer. It also requires a strong understanding of graph algorithms and the willingness to figure out how to implement a graph isomorphism algorithm which is apparently a very difficult problem. This is an algorithm that determines if two graphs are equivalent or not.

In mgmt we have an internal pgraph (graph) library. We currently have a GraphCmp algorithm: https://github.com/purpleidea/mgmt/blob/master/pgraph/pgraph.go#L501 which takes a VertexCmp and EdgeCmp method as input, and compares two graphs.

It has some limitations, in that it fails if there are duplicate vertices in either graph. We'd like to replace this with a proper implementation based on a known algorithm. This is only useful for our tests, and isn't currently used in the running production code. As a result this is LOW priority.

This patch should also include some new tests!

Please leave a comment if you're going to work on this patch!

Thanks and happy hacking

purpleidea avatar Jul 01 '17 20:07 purpleidea

Here are a few misc. references btw:

https://en.wikipedia.org/wiki/Graph_isomorphism_problem https://github.com/networkx/networkx/blob/master/networkx/algorithms/isomorphism/isomorphvf2.py https://www3.cs.stonybrook.edu/~algorith/implement/nauty/implement.shtml http://www.dharwadker.org/tevet/isomorphism/ https://github.com/igraph/igraph/tree/master/src/bliss

The idea is to implement something straight forward that works, not to implement the most efficient 100,000 vertex capable monster.

purpleidea avatar Jul 01 '17 20:07 purpleidea

As I said, I'd like to give it a shot. Any additional discussion needed before I start working on this?

AdnanLFC avatar Jul 28 '17 08:07 AdnanLFC

@AdnanLFC I think you're good to go! Feel free to ping if you come across any implementation or design questions that you'd like answered.

Thanks!

purpleidea avatar Jul 28 '17 08:07 purpleidea

@AdnanLFC Any chance you're still working on this? Please let us know either way, thanks!

purpleidea avatar Jan 02 '18 22:01 purpleidea

Got pulled away by some other things... :disappointed: However, I might give it another try these days.

AdnanLFC avatar Jun 18 '18 17:06 AdnanLFC

I'll take this.

kkuehlz avatar Dec 19 '18 11:12 kkuehlz

@keur Awesome-- one disclosure: this is a hard problem (you've been warned) and it's not 100% clear what's blocking us. Last I looked, this might be helpful...

https://github.com/purpleidea/mgmt/blob/f342e06ef0d0fc173dcb5320e9a376f14d561868/lang/interpret_test.go#L324

https://github.com/purpleidea/mgmt/blob/f342e06ef0d0fc173dcb5320e9a376f14d561868/lang/lang_test.go#L884

I thought I was stuck there. Have a look and see if you think graph isomorphism is necessary.

My secret: I'm a terrible algorithmist :P Thanks for looking into this!

purpleidea avatar Dec 19 '18 11:12 purpleidea

After some more digging, this is very low priority, since it's only needed for tests at the moment.

Some notes:

  1. We can't choose any algorithm that could fail even if that's an infinitesimal probability. It's just not clean.

  2. Even if something is NP-complete, doesn't mean that it won't be able to complete the execution in a reasonable amount of time for the size of graphs that we have. Whether our graphs will have 100's, 1000's, 10,000's of vertices, I don't know, but I'm optimistic that we can perform the needed computations fast enough.

  3. As you might have already seen, there are multiple DAG's in the code base. (I count 3 atm.) The function graph, and the resource graph are the two main ones that we need to be careful about.

In the resource graph (that the engine runs) we currently allow duplicate resources, but this is a bug. We should consider either error-ing there and/or de-duplicating equivalent resources. This is an issue that can be seen if we use the yaml front-end eg:

---
graph: mygraph
resources:
  file:
  - name: file0
    path: "/tmp/f0"
    content: |
      i am f0
    state: exists
  - name: file0
    path: "/tmp/f0"
    content: |
      i am f1
    state: exists
edges: []

This shouldn't work, but it does :/ Badly. They fight. Oops.

This doesn't happen in mcl code, because it's clever enough to de-duplicate output!

The function graph however can have identical (same .String() values, different pointers of course) vertices. We then ultimately run interpret on that AST, and get the resource graph. At this stage we ensure that we don't have duplicate resources.

So it actually seems that GraphCmp is very useful for tests (where we compare a function graph with identical vertices) but not actually used anywhere in the production code. The most similar function to GraphCmp is GraphSync, which fortunately doesn't need to deal with duplicate vertices since this is only used in the engine at the moment where this is supposed to be unique :)

So this patch is useful, but only for tests atm.

purpleidea avatar Dec 29 '18 00:12 purpleidea