alga
alga copied to clipboard
Implement hasVertexP and hasEdgeP
This is an initial PR for hasVertexP and hasEdgeP - predicate variants of hasVertex and hasEdge (reimplemented in terms of the predicate versions). (the latter of which will be useful for some rewriting work I hope to build on top of Algebra.Graph).
I've only looked at the code so far; on my todo list is at least
- [ ] check docstrings are appropriate
- [ ] write tests for hasEdgeP explicitly (but probably both)
- [x] double-check the SPECIALISE pragmas are (still) appropriate
@jmtd Cool, thank you! I like how nicely it all worked out.
There seem to be a performance regression for hasVertex
and possibly hasEdge
too:
hasEdge: 1.17 (bad)
hasVertex: 1.66 (bad)
While 1.17 may just be noise, 1.66 feels like a proper regression, likely due to the missing SPECIALISE
pragma. @nobrakal Do you agree?
Performace regression is reported by our CI infrastructure on every commit, so if you push any fixes you should be able to see the results in the DOBENCH="Graph"
Travis job. For example:
https://travis-ci.org/snowleopard/alga/jobs/446759541
Further things to do:
-
[ ] We probably want to return the actual matching vertices and edges from these functions. What about calling them
findVertex
andfindEdge
and modifying the type signatures slightly?findVertex :: (a -> Bool) -> Graph a -> [a] findEdge :: (a -> Bool) -> (a -> Bool) -> Graph a -> [(a, a)]
Then
hasVertex
could simply check if the list is empty. But I'm not sure if this can be implemented without sacrificing performance. -
[ ] Implement this functionality in
Algebra.Graph.NonEmpty
.
Hi,
While 1.17 may just be noise, 1.66 feels like a proper regression
The two are proper regression in this case :)
About the SPECIALIZE
pragma: It is here only to optimize out a class constraint, so it is still needed for both hasVertex
and hasEdge
(and adding it to hasVertexP
or hasEdgeP
will do nothing, since these functions does not have any class constraint).
The problem is that hasVertex
contains a reference to an other function (hasVertexP
) which is currently not inlined, so it add a level of indirection in the code of hasVertex
The solution is to add an INLINE
pragma to both hasVertexP
and hasEdgeP
:) (and it is anyway a good idea to inline these !).
Then hasVertex could simply check if the list is empty. But I'm not sure if this can be implemented without sacrificing performance.
Maybe lazyness can help preventing a performance drop, but I am not sure.
Adding INLINE
pragmas seems to have fixed the regression.
(There is still this strange result hasEdge: 0.69 (good)
, which we believe is a bug.)
@jmtd Could you try the following?
{-# INLINE [1] findVertices #-}
findVertices :: (a -> Bool) -> Graph a -> [a]
findVertices p = foldg [] [ x | p x ] (++) (++)
hasVertex :: Eq a => a -> Graph a -> Bool
hasVertex x = not . null $ findVertices (== x)
I'm interested to see whether this will be as fast as your implementation via hasVertexP
.
Hi @snowleopard
findVertices p = foldg [] [ x | p x ] (++) (++)
I think there must be a typo in the list comprehension here, my GHCI doesn't like it either, or perhaps I need to load a language extension?
@jmtd Oh, I'm sorry, I meant:
findVertices p = foldg [] (\x -> [ x | p x ]) (++) (++)
NP. I've just pushed that, let's see what the performance is like.
I'd quite like to reproduce the performance measurement stuff locally (to avoid round-tripping github PRs to test things). I should perhaps also take a look at Stack if I'm spring-cleaning my setup.
I'm working on a findEdges
now
As an aside, would you like PRs to add things like {-# LANGUAGE TupleSections #-}
where necessary — I see this is already specified in the cabal file, but is still needed (I think) for e.g. loading the local copy into ghci
w/o also specifying it on the command line
travis and appveyor results for the findVertices
change only (ab78e059
) show a regression
hasVertex: 1.55 (bad)
I'm going to push an initial findEdges
now to see how that performs.
(this patch series will need some rebasing before it's mergeable)
Results for findEdges
as of 5ed33b63284afb275d7584d0203d0b4327ecc02e
hasEdge: 1.35 (bad) hasVertex: 2.03 (bad)
@jmtd Thanks for giving it a try. I suggest we proceed as follows.
Let's keep the Algebra.Graph
module as is, but instead add new functionality to the generic module that defines the ToGraph
type class. All we need to implement both findVertices
and hasVertexP
is the foldg
method -- we do not need to know the graph itself:
findVertices :: ToGraph g => (ToVertex g -> Bool) -> g -> [ToVertex g]
findVertices p = foldg [] (\x -> [ x | p x ]) (++) (++)
With this approach other instances of the ToGraph
class can use these functions too.
Also, I'd like to understand your use case a bit better: how do you plan to use these functions? Merely checking if a vertex with the right properties exist (using hasVertexP
) is probably not too useful. Same about hasEdgeP
. Also, the function findVertices
could probably be implemented more directly as a combination of Set.filter
and vertexSet
.
My understanding is that you actually want to write some sort of graph transformation function that, for example, turns merges all edges that satisfy a given property into a vertex. Could you provide some more details? Otherwise, I worry that we might end up providing too many functions with similar looking functionality, hence confusing Alga users: hasVertex
, hasVertexP
, findVertices
, vertexSet
.
@snowleopard wrote:
Also, I'd like to understand your use case a bit better: how do you plan to use these functions? Merely checking if a vertex with the right properties exist (using hasVertexP) is probably not too useful. … My understanding is that you actually want to write some sort of graph transformation function that, for example, turns merges all edges that satisfy a given property into a vertex.
That's right. Here's an example of a graph rewrite rule, coded in a string, using composition to represent the flow of the stream data
streamMap f . streamMap g = streamMap (f . g)
We have stream graphs coded up using Graph and and ADT like the following
data StreamVertex = StreamVertex
{ vertexId :: Int -- necessary to stop unifying distinct nodes which are otherwise identical
, operator :: StreamOperator -- data StreamOperator = Map | Filter | Expand…
, parameters :: [String] -- strings of code representing operator arguments
, intype :: String -- capturing the type flowing into this node
} deriving (Eq)
What I am hoping to do is encode the stream rules using a similar ADT to the above instead of strings (so probably make the parameters argument a type variable and re-use the above), then apply rules to instances of Graph StreamVertex
All the rules I am working with at the moment are derived from analysing operators pair-wise, and generally swap the order of two operators, but also change their parameters; so from a Graph POV adding and removing both edges and nodes. More complex rewrite rules could involve introducing more nodes and consequently edges. We don't currently label edges, but direction is important.
@jmtd Thanks! Let me think about this for a few days. I'm travelling until 11 November, so it's a bit difficult to find time to figure out a small set of graph transformation primitives that will be useful both for your problem and for other users of the library.
Thanks @snowleopard. I'm going to see what PoC I can build on top of these primitives to further assess how this all might work for me.