graphs icon indicating copy to clipboard operation
graphs copied to clipboard

Interest in other graph libraries

Open travitch opened this issue 6 years ago • 7 comments

Thanks for this - graph library performance is definitely not obvious. Do you have any interest in additional graph libraries to examine? I was a bit frustrated with fgl performance a while ago and put together a library that I thought would address most of the problems I had with it. It might be of interest to you: https://github.com/travitch/haggle (unfortunately, not up on hackage yet).

It has:

  • A few different data representations to accommodate different graph topologies, some of which are very space efficient
  • Graphs with/without edge and node labels
  • Optional mutable graph creation (in IO/ST) via a freeze and thaw interface

travitch avatar Jul 06 '18 16:07 travitch

Do you have any interest in additional graph libraries to examine?

Yes of course, thanks for the link !

The approach is very interesting, I will add benchs for haggle asap :)

nobrakal avatar Jul 06 '18 17:07 nobrakal

@travitch I just spent some time trying to add your library. Some questions, to ensure that I don't miss anything:

  • If I want to deal with simple directed graph (like the graphs of FGL), is Data.Graph.Haggle.SimpleBiDigraph the best choice ?
  • If yes, if there a possible NFData instance for MSimpleBiDigraph and SimpleBiDigraph ? Maybe the first can be made by freezing the mutable graph and then use a the one defined we can define for SimpleBeDiGraph ?
  • You don't provide anything like mkGraph, so I wrote:
mkGraph e = let maxV = extractMaxVertex e
        in do
          g <- newSizedMSimpleBiDigraph maxV undefined
          vert <- replicateM maxV (addVertex g)
          mapM_ (\(u,v) -> addEdge g (vert !! u)(vert !! v)) e
          return g

where e is a list of edges (of type [(Int,Int)] and maV is maxV = max $ (\x y -> x ++ y) $ unzip e. The vertices in the graph are made from [0..(maxV-1)].

Is it ok ?

nobrakal avatar Jul 07 '18 18:07 nobrakal

Thanks for taking a look at it!

  • Which graph is best depends a bit on what you want to test. The most direct analog to FGL is the PatriciaTree, which should be almost the same performance (but with some better strictness properties). SimpleBiDigraph will be more compact in memory and should probably be faster, while still supporting most of the same operations. Digraph might be interesting as well - it is much more compact, but 1) doesn't provide access to predecessors for nodes, and 2) has an expensive test for edge existence.
  • I just added the missing NFData instances for the graph types. I don't think it is really possible to make instances for the mutable graphs, as they contain IO/STRefs. Freezing the graph and then using the NFData instance for the immutable graphs is probably the best bet there.
  • That is a good way to construct graphs. I didn't provide that function, as I ran into a number of use cases where that wasn't quite enough for my needs (e.g., when there are nodes in the graph that don't have any edges to them, the edge list construction is awkward).

travitch avatar Jul 08 '18 05:07 travitch

Thanks for the work :)

I started mine here: https://github.com/haskell-perf/graphs/tree/haggle (no many things are benchmarked, I will dig deeper inside the API soon).

I think the more valuable is SimpleBiDiGraph since it is closer to others libraries but with a different implementation.

I am just stuck with a thing: How can I bench a graph modification, since it is encapsulated inside a PrimMonad ? Sadly, I have no NFData instance available... Do you have any idea ? Ideally, I would fully evaluate the graph creation (to ensure no time is wasted after), then adding my edge, then fully evaluate the result. I don't know if this is possible...

nobrakal avatar Jul 09 '18 18:07 nobrakal

If you are running in the IO Monad, the modifications to the underlying data structures (which are MVectors and IORefs) happen exactly when you expect. The modifications to IORefs are done using modifyRef', which evaluates its argument to WHNF (since they are all Ints, that is NF). I guess my argument is that there isn't anything else to force, but I don't have a proof that there are no other thunks lying around.

For the get* operations on MSimpleBiDigraph, you could deepseq the returned values, at least.

travitch avatar Jul 10 '18 02:07 travitch

If we are sure that everything is evaluated, as you are saying, then the NFData instance could simply be const (), it should do the trick... I will make some experiments ^^.

The problem of get* operations is that they come encapsulated in the PrimMonad so the NFData is not simple...

nobrakal avatar Jul 10 '18 12:07 nobrakal

So I have do some work.

Criterion provide a function to benchmark a IO action (here it apply for mutable graphs). The only problem is that it will benchmark the whole action. So I need to benchmark the creation itself with the "action" (like addEdge) then fully evaluate the results.

Fortunately, I have implemented it for others libraries, so we can compare :)

I think my function to build a graph from a list of edges is very poor. Here are the results:

Doing:


  • edgeCount
  • vertexCount
  • addEdge

Using [("Mesh",3),("Clique",3)] as graphs

edgeCount

Description: Count the edges of the graph

Mesh

1 10 100
Alga 51.52 ns 7.943 μs 163.2 μs
Containers 30.87 ns 388.1 ns 4.460 μs
Fgl 36.41 ns 5.869 μs 202.5 μs
Haggle 914.9 ns 34.31 μs 455.7 μs
Hash-Graph 39.54 ns 5.985 μs 198.2 μs

Clique

1 10 100
Alga 51.51 ns 29.03 μs 6.486 ms
Containers 31.38 ns 1.051 μs 107.4 μs
Fgl 36.31 ns 17.85 μs 10.12 ms
Haggle 904.9 ns 92.23 μs 13.55 ms
Hash-Graph 39.48 ns 19.89 μs 7.820 ms

SUMMARY:

  • Containers was the fastest 6 times

ABSTRACT: (Based on an average of the ratio between largest benchmarks)

  • Containers was 112.93 times faster than Haggle
  • Alga was 2.48 times faster than Haggle
  • Hash-Graph was 2.05 times faster than Haggle
  • Fgl was 1.84 times faster than Haggle

vertexCount

Description: Count the vertices of the graph

Mesh

1 10 100
Alga 35.03 ns 910.1 ns 14.45 μs
Fgl 25.93 ns 4.446 μs 182.4 μs
Haggle 919.7 ns 33.88 μs 456.4 μs
Hash-Graph 26.36 ns 5.281 μs 192.5 μs

Clique

1 10 100
Alga 34.67 ns 3.247 μs 798.9 μs
Fgl 25.66 ns 12.37 μs 5.190 ms
Haggle 899.5 ns 93.55 μs 13.26 ms
Hash-Graph 25.62 ns 18.58 μs 7.770 ms

SUMMARY:

  • Alga was the fastest 4 times

There was 2 ex-aequo

ABSTRACT: (Based on an average of the ratio between largest benchmarks)

  • Alga was 21.76 times faster than Haggle
  • Fgl was 2.54 times faster than Haggle
  • Hash-Graph was 1.94 times faster than Haggle

addEdge

Description: Add an edge (not already in the graph)

Mesh

1 10 100
Alga 50.26 ns 747.4 ns 9.611 μs
Fgl 132.0 ns 6.361 μs 213.3 μs
Haggle 1.291 μs 35.80 μs 461.4 μs
Hash-Graph 159.8 ns 7.108 μs 212.8 μs

Clique

1 10 100
Alga 50.43 ns 2.484 μs 257.6 μs
Fgl 133.2 ns 19.71 μs 10.41 ms
Haggle 1.328 μs 95.85 μs 13.15 ms
Hash-Graph 164.3 ns 23.07 μs 8.069 ms

SUMMARY:

  • Alga was the fastest 18 times

ABSTRACT: (Based on an average of the ratio between largest benchmarks)

  • Alga was 49.48 times faster than Haggle
  • Hash-Graph was 1.91 times faster than Haggle
  • Fgl was 1.73 times faster than Haggle

So I think my function is so poor that it kills the results...

nobrakal avatar Jul 14 '18 15:07 nobrakal