alga
alga copied to clipboard
Replacing Algebra.Graph by Maybe (Algebra.Graph.NonEmpty)
Hi,
I have been working on a different implementation of Algebra.Graph
, based on Algebra.Graph.NonEmpty
.
The idea is to have:
newtype Graph a = G (Maybe (NonEmptyGraph a))
Using patterns, with:
pattern Empty :: Graph a
pattern Empty = G Nothing
pattern Vertex :: a -> Graph a
pattern Vertex a = G (Just (N.Vertex a))
pattern Overlay :: NonEmptyGraph a -> NonEmptyGraph a -> Graph a
pattern Overlay a b = G (Just (N.Overlay a b))
pattern Connect :: NonEmptyGraph a -> NonEmptyGraph a -> Graph a
pattern Connect a b = G (Just (N.Connect a b))
One can reproduce the traditional Algebra.Graph
interface.
This big change can be summarized with some functions:
overlay :: Graph a -> Graph a -> Graph a
overlay l@(G a) r@(G b) = maybe r (\x -> maybe l (Overlay x) b) a
We now check the neutrality of empty
directly into the overlay
and connect
functions.
edges :: [(a, a)] -> Graph a
edges = G . fmap N.edges1 . NL.nonEmpty
edges
become more complicated and slower.
foldg :: b -> (a -> b) -> (b -> b -> b) -> (b -> b -> b) -> Graph a -> b
foldg e v o c (G g) = maybe e (N.foldg1 v o c) g
foldg
is directly based on foldg1
(and this is the case for others functions, when possible).
removeEdge :: Eq a => a -> a -> Graph a -> Graph a
removeEdge s t (G g) = G $ N.removeEdge s t <$> g
fmap
allow a pretty syntax when a function will act as the identity on the empty graph.
Some benefits
- Type safety in the inner representation: one can be sure when pattern matching
Overlay
orConnect
that the two graphs are both non-empty. - Faster graph inspection. Because of the dropped
Empty
constructor,foldg1
is faster thanfoldg
. Thus, inspection functions are faster.
Some problems:
- The creation takes longer from a list of edges, or anything, because we pattern match to detect the empty graph.
- Graph modification takes longer, mainly this is affecting
induce
because we have to wrap the intermediate results into theMaybe
monad. - Old haddock (for ghc 7.8) have problems with patterns.
Benchs:
Using the pretty little script I made (on my own computer with graph with 1000 vertices):
isEmpty: 0.63 (good)
vertexList: 0.98 (OK)
vertexCount: 1.01 (OK)
hasVertex: 0.85 (good)
edgeCount: 0.91 (OK)
edgeList: 0.74 (good)
hasEdge: 0.45 (good)
addEdge: 1.22 (bad)
addVertex: 1.06 (OK)
removeVertex: 1.25 (bad)
equality: 1.01 (OK)
removeEdge: 1.00 (OK)
transpose: 0.84 (good)
creation: 1.29 (bad)
So for example addEdge
was so fast, that there is drop here when we pattern match to see if it the previous gaph was empty or not.
(Note that I have sneaked the improvement on hasEdge
of #88 (but it is even better here)).
The benchmarked code is here: https://github.com/nobrakal/alga/blob/MaybeNonEmpty/src/Algebra/Graph.hs
@nobrakal Many thanks for the experiment!
I have been wondering recently: do we even need graphs that can be empty? Presumably, non-empty graphs cover 99% of use-cases and are generally safer not only in terms of internal representation, but for the users too.
The creation takes longer from a list of edges, or anything, because we pattern match to detect the empty graph.
I don't understand this: it's just a single comparison, right? It shouldn't even be measurable.
I have been wondering recently: do we even need graphs that can be empty?
With the actual Empty
constructor, I don't think so. But providing an encapsulated version using Maybe NonEmptyGraph
:
- allow to have "regular" functions (
edges
take a list, and not anNonEmpty
one) - is coherent with functions from
Algebra.Graph.NonEmpty
(likeinduce1
) - Will provide all the handy functions dealing with empty graphs, so a user who must deals with them have not to rewrite them, even if they are really dumb (like `isEmpty (G g) = isNothing g).
- introduce the user to NonEmptyGraph, it may even worth a adding a mention saying that
NonEmptyGraph
is faster in all cases, and if they are dealing with nonEmptyOne
But I think we definitely need to focus more in NonEmpty
:)
I don't understand this: it's just a single comparison, right? It shouldn't even be measurable.
I don't understand yet too... It is a huge drop for a single comparison !
I don't understand yet too... It is a huge drop for a single comparison !
Perhaps, it prevents some important optimisation? Could you investigate by looking at generated Core?
Perhaps, it prevents some important optimisation?
It was the case for edges
. Habitually, there is a merge between a foldr
and map
(this is what we used to speedup overlays
), and after some search, I found that it didn't happened here.
I fixed it by hand by writing:
edges :: [(a, a)] -> Graph a
edges [] = Empty
edges (x:xs) = G $ Just $ foldr (N.Overlay . uncurry N.edge) (uncurry N.edge x) xs
It dropped to ratio to 1.09
, so it is good. I don't know if we can do better
@nobrakal Why not use edges1
in the second case? Then you don't need to reimplement it.
In fact I need to do the change in edges1
and use it from Algebra.Graph.edges
.
edges1 (x :| xs) = foldr (Overlay . uncurry edge) (uncurry edge x) xs
The problem is that the combination of foldr1
and fmap (uncurry edge)
does not behave very well ^^
The last benchs:
isEmpty: 0.63 (good)
vertexList: 1.06 (OK)
VertexCount: 1.07 (OK)
hasVertex: 5.13 (bad)
edgeCount: 1.09 (OK)
edgeList: 0.76 (good)
hasEdge: 0.40 (good)
hasSelfLoop: 1.17 (bad)
addEdge: 1.08 (OK)
addVertex: 1.07 (OK)
removeVertex: 1.15 (bad)
equality: 1.06 (OK)
removeEdge: 1.04 (OK)
transpose: 1.01 (OK)
creation: 0.97 (OK)
So everything is ok, note that the rough number for hasVertex
is a false positive: it search for the vertex 0 which is at the end of the graph in the new representation, and at the begining for the actual one, so the times are not the same (since || looks first for the left side)
note that the rough number for
hasVertex
is a false positive
Hmm, this makes me think that the benchmarking suite is too fragile: it shouldn't measure performance by looking up only a single vertex. It should average time for several different vertices, otherwise some libraries/implementations will get an unfair advantage by chance.
Sorry, I wasn't clear. I actually test 4/5 vertices (for a graph with n vertices I test the search of vertex 0, n/3, 2*n/3, n+1). But the difference is actually very huge for the first one (50 ns for the actual Alga but 3 micro seconds for the new one (3 micro seconds it the time taken by both for searching a vertex at the end of the graph )).
It might worth it to add some vertices, but I don't think it will really change the results ^^
@nobrakal Ah, I see, thanks for clarifying... Yes, that's what one gets for having O(n) complexity :) There is really a huge swing between best and worst cases.
Since we are now starting to look more carefully at non-empty graphs, I think it may be worth adding them into the regression suite... Perhaps, into a separate Travis instance, so we don't hit the time limit?
Yeah indeed ^^.
Perhaps, into a separate Travis instance
I totally agree, I will send a PR
I just realised that NonEmpty.Graph (Maybe a)
is isomorphic to Graph a
:
-
NonEmpty.Vertex (Just a) ~ Vertex a
-
NonEmpty.Vertex Nothing ~ Empty
Not sure this observartion is useful, but I thought I better record it here.
I still think Maybe (NonEmpty.Graph a)
is a better representation for possibly empty graphs.