alga Replacing Algebra.Graph by Maybe (Algebra.Graph.NonEmpty)

Hi,

I have been working on a different implementation of Algebra.Graph, based on Algebra.Graph.NonEmpty.

The idea is to have:

newtype Graph a = G (Maybe (NonEmptyGraph a))

Using patterns, with:

pattern Empty :: Graph a
pattern Empty = G Nothing

pattern Vertex :: a -> Graph a
pattern Vertex a = G (Just (N.Vertex a))

pattern Overlay :: NonEmptyGraph a -> NonEmptyGraph a -> Graph a
pattern Overlay a b = G (Just (N.Overlay a b))

pattern Connect :: NonEmptyGraph a -> NonEmptyGraph a -> Graph a
pattern Connect a b = G (Just (N.Connect a b))

One can reproduce the traditional Algebra.Graph interface.

This big change can be summarized with some functions:

overlay :: Graph a -> Graph a -> Graph a
overlay l@(G a) r@(G b) = maybe r (\x -> maybe l (Overlay x) b) a

We now check the neutrality of empty directly into the overlay and connect functions.

edges :: [(a, a)] -> Graph a
edges = G . fmap N.edges1 . NL.nonEmpty

edges become more complicated and slower.

foldg :: b -> (a -> b) -> (b -> b -> b) -> (b -> b -> b) -> Graph a -> b
foldg e v o c (G g) = maybe e (N.foldg1 v o c) g

foldg is directly based on foldg1 (and this is the case for others functions, when possible).

removeEdge :: Eq a => a -> a -> Graph a -> Graph a
removeEdge s t (G g) = G $ N.removeEdge s t <$> g

fmap allow a pretty syntax when a function will act as the identity on the empty graph.

Some benefits

Type safety in the inner representation: one can be sure when pattern matching Overlay or Connect that the two graphs are both non-empty.
Faster graph inspection. Because of the dropped Empty constructor, foldg1 is faster than foldg. Thus, inspection functions are faster.

Some problems:

The creation takes longer from a list of edges, or anything, because we pattern match to detect the empty graph.
Graph modification takes longer, mainly this is affectinginduce because we have to wrap the intermediate results into the Maybe monad.
Old haddock (for ghc 7.8) have problems with patterns.

Benchs:

Using the pretty little script I made (on my own computer with graph with 1000 vertices):

isEmpty: 0.63 (good)
vertexList: 0.98 (OK)
vertexCount: 1.01 (OK)
hasVertex: 0.85 (good)
edgeCount: 0.91 (OK)
edgeList: 0.74 (good)
hasEdge: 0.45 (good)
addEdge: 1.22 (bad)
addVertex: 1.06 (OK)
removeVertex: 1.25 (bad)
equality: 1.01 (OK)
removeEdge: 1.00 (OK)
transpose: 0.84 (good)
creation: 1.29 (bad)

So for example addEdge was so fast, that there is drop here when we pattern match to see if it the previous gaph was empty or not.

(Note that I have sneaked the improvement on hasEdge of #88 (but it is even better here)).

The benchmarked code is here: https://github.com/nobrakal/alga/blob/MaybeNonEmpty/src/Algebra/Graph.hs

Jul 03 '18 14:07 nobrakal

@nobrakal Many thanks for the experiment!

I have been wondering recently: do we even need graphs that can be empty? Presumably, non-empty graphs cover 99% of use-cases and are generally safer not only in terms of internal representation, but for the users too.

The creation takes longer from a list of edges, or anything, because we pattern match to detect the empty graph.

I don't understand this: it's just a single comparison, right? It shouldn't even be measurable.

Jul 03 '18 22:07 snowleopard

I have been wondering recently: do we even need graphs that can be empty?

With the actual Empty constructor, I don't think so. But providing an encapsulated version using Maybe NonEmptyGraph:

allow to have "regular" functions (edges take a list, and not an NonEmpty one)
is coherent with functions from Algebra.Graph.NonEmpty (like induce1)
Will provide all the handy functions dealing with empty graphs, so a user who must deals with them have not to rewrite them, even if they are really dumb (like `isEmpty (G g) = isNothing g).
introduce the user to NonEmptyGraph, it may even worth a adding a mention saying that NonEmptyGraph is faster in all cases, and if they are dealing with nonEmptyOne

But I think we definitely need to focus more in NonEmpty :)

I don't understand this: it's just a single comparison, right? It shouldn't even be measurable.

I don't understand yet too... It is a huge drop for a single comparison !

Jul 04 '18 03:07 nobrakal

I don't understand yet too... It is a huge drop for a single comparison !

Perhaps, it prevents some important optimisation? Could you investigate by looking at generated Core?

Jul 04 '18 10:07 snowleopard

Perhaps, it prevents some important optimisation?

It was the case for edges. Habitually, there is a merge between a foldr and map (this is what we used to speedup overlays), and after some search, I found that it didn't happened here.

I fixed it by hand by writing:

edges :: [(a, a)] -> Graph a
edges [] = Empty
edges (x:xs) = G $ Just $ foldr (N.Overlay . uncurry N.edge) (uncurry N.edge x) xs

It dropped to ratio to 1.09, so it is good. I don't know if we can do better

Jul 04 '18 15:07 nobrakal

@nobrakal Why not use edges1 in the second case? Then you don't need to reimplement it.

Jul 04 '18 17:07 snowleopard

In fact I need to do the change in edges1 and use it from Algebra.Graph.edges.

edges1 (x :| xs) = foldr (Overlay . uncurry edge) (uncurry edge x) xs

The problem is that the combination of foldr1 and fmap (uncurry edge) does not behave very well ^^

Jul 04 '18 18:07 nobrakal

The last benchs:

isEmpty: 0.63 (good) 
vertexList: 1.06 (OK)
VertexCount: 1.07 (OK) 
hasVertex: 5.13 (bad) 
edgeCount: 1.09 (OK) 
edgeList: 0.76 (good) 
hasEdge: 0.40 (good) 
hasSelfLoop: 1.17 (bad) 
addEdge: 1.08 (OK) 
addVertex: 1.07 (OK) 
removeVertex: 1.15 (bad) 
equality: 1.06 (OK)
removeEdge: 1.04 (OK) 
transpose: 1.01 (OK) 
creation: 0.97 (OK)

So everything is ok, note that the rough number for hasVertex is a false positive: it search for the vertex 0 which is at the end of the graph in the new representation, and at the begining for the actual one, so the times are not the same (since || looks first for the left side)

Jul 06 '18 12:07 nobrakal

note that the rough number for hasVertex is a false positive

Hmm, this makes me think that the benchmarking suite is too fragile: it shouldn't measure performance by looking up only a single vertex. It should average time for several different vertices, otherwise some libraries/implementations will get an unfair advantage by chance.

Jul 06 '18 15:07 snowleopard

Sorry, I wasn't clear. I actually test 4/5 vertices (for a graph with n vertices I test the search of vertex 0, n/3, 2*n/3, n+1). But the difference is actually very huge for the first one (50 ns for the actual Alga but 3 micro seconds for the new one (3 micro seconds it the time taken by both for searching a vertex at the end of the graph )).

It might worth it to add some vertices, but I don't think it will really change the results ^^

Jul 06 '18 16:07 nobrakal

@nobrakal Ah, I see, thanks for clarifying... Yes, that's what one gets for having O(n) complexity :) There is really a huge swing between best and worst cases.

Jul 07 '18 23:07 snowleopard

Since we are now starting to look more carefully at non-empty graphs, I think it may be worth adding them into the regression suite... Perhaps, into a separate Travis instance, so we don't hit the time limit?

Jul 07 '18 23:07 snowleopard

Yeah indeed ^^.

Perhaps, into a separate Travis instance

I totally agree, I will send a PR

Jul 08 '18 17:07 nobrakal

I just realised that NonEmpty.Graph (Maybe a) is isomorphic to Graph a:

NonEmpty.Vertex (Just a) ~ Vertex a
NonEmpty.Vertex Nothing ~ Empty

Not sure this observartion is useful, but I thought I better record it here.

I still think Maybe (NonEmpty.Graph a) is a better representation for possibly empty graphs.

Dec 27 '18 18:12 snowleopard

alga alga copied to clipboard

Replacing Algebra.Graph by Maybe (Algebra.Graph.NonEmpty)

Some benefits

Some problems:

Benchs:

alga
alga copied to clipboard