alga icon indicating copy to clipboard operation
alga copied to clipboard

Replacing Algebra.Graph by Maybe (Algebra.Graph.NonEmpty)

Open nobrakal opened this issue 6 years ago • 13 comments

Hi,

I have been working on a different implementation of Algebra.Graph, based on Algebra.Graph.NonEmpty.

The idea is to have:

newtype Graph a = G (Maybe (NonEmptyGraph a))

Using patterns, with:

pattern Empty :: Graph a
pattern Empty = G Nothing

pattern Vertex :: a -> Graph a
pattern Vertex a = G (Just (N.Vertex a))

pattern Overlay :: NonEmptyGraph a -> NonEmptyGraph a -> Graph a
pattern Overlay a b = G (Just (N.Overlay a b))

pattern Connect :: NonEmptyGraph a -> NonEmptyGraph a -> Graph a
pattern Connect a b = G (Just (N.Connect a b))

One can reproduce the traditional Algebra.Graph interface.

This big change can be summarized with some functions:

overlay :: Graph a -> Graph a -> Graph a
overlay l@(G a) r@(G b) = maybe r (\x -> maybe l (Overlay x) b) a

We now check the neutrality of empty directly into the overlay and connect functions.

edges :: [(a, a)] -> Graph a
edges = G . fmap N.edges1 . NL.nonEmpty

edges become more complicated and slower.

foldg :: b -> (a -> b) -> (b -> b -> b) -> (b -> b -> b) -> Graph a -> b
foldg e v o c (G g) = maybe e (N.foldg1 v o c) g

foldg is directly based on foldg1 (and this is the case for others functions, when possible).

removeEdge :: Eq a => a -> a -> Graph a -> Graph a
removeEdge s t (G g) = G $ N.removeEdge s t <$> g

fmap allow a pretty syntax when a function will act as the identity on the empty graph.

Some benefits

  • Type safety in the inner representation: one can be sure when pattern matching Overlay or Connect that the two graphs are both non-empty.
  • Faster graph inspection. Because of the dropped Empty constructor, foldg1 is faster than foldg. Thus, inspection functions are faster.

Some problems:

  • The creation takes longer from a list of edges, or anything, because we pattern match to detect the empty graph.
  • Graph modification takes longer, mainly this is affectinginduce because we have to wrap the intermediate results into the Maybe monad.
  • Old haddock (for ghc 7.8) have problems with patterns.

Benchs:

Using the pretty little script I made (on my own computer with graph with 1000 vertices):

isEmpty: 0.63 (good)
vertexList: 0.98 (OK)
vertexCount: 1.01 (OK)
hasVertex: 0.85 (good)
edgeCount: 0.91 (OK)
edgeList: 0.74 (good)
hasEdge: 0.45 (good)
addEdge: 1.22 (bad)
addVertex: 1.06 (OK)
removeVertex: 1.25 (bad)
equality: 1.01 (OK)
removeEdge: 1.00 (OK)
transpose: 0.84 (good)
creation: 1.29 (bad)

So for example addEdge was so fast, that there is drop here when we pattern match to see if it the previous gaph was empty or not.

(Note that I have sneaked the improvement on hasEdge of #88 (but it is even better here)).

The benchmarked code is here: https://github.com/nobrakal/alga/blob/MaybeNonEmpty/src/Algebra/Graph.hs

nobrakal avatar Jul 03 '18 14:07 nobrakal

@nobrakal Many thanks for the experiment!

I have been wondering recently: do we even need graphs that can be empty? Presumably, non-empty graphs cover 99% of use-cases and are generally safer not only in terms of internal representation, but for the users too.

The creation takes longer from a list of edges, or anything, because we pattern match to detect the empty graph.

I don't understand this: it's just a single comparison, right? It shouldn't even be measurable.

snowleopard avatar Jul 03 '18 22:07 snowleopard

I have been wondering recently: do we even need graphs that can be empty?

With the actual Empty constructor, I don't think so. But providing an encapsulated version using Maybe NonEmptyGraph:

  • allow to have "regular" functions (edges take a list, and not an NonEmpty one)
  • is coherent with functions from Algebra.Graph.NonEmpty (like induce1)
  • Will provide all the handy functions dealing with empty graphs, so a user who must deals with them have not to rewrite them, even if they are really dumb (like `isEmpty (G g) = isNothing g).
  • introduce the user to NonEmptyGraph, it may even worth a adding a mention saying that NonEmptyGraph is faster in all cases, and if they are dealing with nonEmptyOne

But I think we definitely need to focus more in NonEmpty :)

I don't understand this: it's just a single comparison, right? It shouldn't even be measurable.

I don't understand yet too... It is a huge drop for a single comparison !

nobrakal avatar Jul 04 '18 03:07 nobrakal

I don't understand yet too... It is a huge drop for a single comparison !

Perhaps, it prevents some important optimisation? Could you investigate by looking at generated Core?

snowleopard avatar Jul 04 '18 10:07 snowleopard

Perhaps, it prevents some important optimisation?

It was the case for edges. Habitually, there is a merge between a foldr and map (this is what we used to speedup overlays), and after some search, I found that it didn't happened here.

I fixed it by hand by writing:

edges :: [(a, a)] -> Graph a
edges [] = Empty
edges (x:xs) = G $ Just $ foldr (N.Overlay . uncurry N.edge) (uncurry N.edge x) xs

It dropped to ratio to 1.09, so it is good. I don't know if we can do better

nobrakal avatar Jul 04 '18 15:07 nobrakal

@nobrakal Why not use edges1 in the second case? Then you don't need to reimplement it.

snowleopard avatar Jul 04 '18 17:07 snowleopard

In fact I need to do the change in edges1 and use it from Algebra.Graph.edges.

edges1 (x :| xs) = foldr (Overlay . uncurry edge) (uncurry edge x) xs

The problem is that the combination of foldr1 and fmap (uncurry edge) does not behave very well ^^

nobrakal avatar Jul 04 '18 18:07 nobrakal

The last benchs:

isEmpty: 0.63 (good) 
vertexList: 1.06 (OK)
VertexCount: 1.07 (OK) 
hasVertex: 5.13 (bad) 
edgeCount: 1.09 (OK) 
edgeList: 0.76 (good) 
hasEdge: 0.40 (good) 
hasSelfLoop: 1.17 (bad) 
addEdge: 1.08 (OK) 
addVertex: 1.07 (OK) 
removeVertex: 1.15 (bad) 
equality: 1.06 (OK)
removeEdge: 1.04 (OK) 
transpose: 1.01 (OK) 
creation: 0.97 (OK)

So everything is ok, note that the rough number for hasVertex is a false positive: it search for the vertex 0 which is at the end of the graph in the new representation, and at the begining for the actual one, so the times are not the same (since || looks first for the left side)

nobrakal avatar Jul 06 '18 12:07 nobrakal

note that the rough number for hasVertex is a false positive

Hmm, this makes me think that the benchmarking suite is too fragile: it shouldn't measure performance by looking up only a single vertex. It should average time for several different vertices, otherwise some libraries/implementations will get an unfair advantage by chance.

snowleopard avatar Jul 06 '18 15:07 snowleopard

Sorry, I wasn't clear. I actually test 4/5 vertices (for a graph with n vertices I test the search of vertex 0, n/3, 2*n/3, n+1). But the difference is actually very huge for the first one (50 ns for the actual Alga but 3 micro seconds for the new one (3 micro seconds it the time taken by both for searching a vertex at the end of the graph )).

It might worth it to add some vertices, but I don't think it will really change the results ^^

nobrakal avatar Jul 06 '18 16:07 nobrakal

@nobrakal Ah, I see, thanks for clarifying... Yes, that's what one gets for having O(n) complexity :) There is really a huge swing between best and worst cases.

snowleopard avatar Jul 07 '18 23:07 snowleopard

Since we are now starting to look more carefully at non-empty graphs, I think it may be worth adding them into the regression suite... Perhaps, into a separate Travis instance, so we don't hit the time limit?

snowleopard avatar Jul 07 '18 23:07 snowleopard

Yeah indeed ^^.

Perhaps, into a separate Travis instance

I totally agree, I will send a PR

nobrakal avatar Jul 08 '18 17:07 nobrakal

I just realised that NonEmpty.Graph (Maybe a) is isomorphic to Graph a:

  • NonEmpty.Vertex (Just a) ~ Vertex a
  • NonEmpty.Vertex Nothing ~ Empty

Not sure this observartion is useful, but I thought I better record it here.

I still think Maybe (NonEmpty.Graph a) is a better representation for possibly empty graphs.

snowleopard avatar Dec 27 '18 18:12 snowleopard