alga
alga copied to clipboard
Explore alternative implementations for AdjacencyMap
We currently store vertices of type a
directly in an AdjacencyMap a
, which may lead to poor performance if the corresponding Ord a
instance is slow. One example, which motivated me to record this issue, is the SCC algorithm developed in #128 that produces AdjacencyMap (NonEmpty.AdjacencyMap a)
, i.e. graphs whose vertices are graphs (and graphs are expensive to compare, see #126).
An alternative would be to define something like
data AdjacencyMap a = AdjacencyMap
{ graph :: AdjacencyIntMap
, value :: Int -> a
, index :: a -> Maybe Int }
This is similar to Data.Graph.graphFromEdges
:
http://hackage.haskell.org/package/containers-0.6.0.1/docs/Data-Graph.html#v:graphFromEdges
In this way, we can benefit from the high performance of the underlying AdjacencyIntMap
yet provide a typed high-level interface AdjacencyMap a
to the user.
Perhaps more generally, it would be nice to be able to provide a bijective map between the "real" node type and a "key" type which is cheaper to store, or has a cheaper (or existent) Ord
instance.
Note that this can be a blocker for moving from Data.Graph
, which allows the alternative key type. I think I just rediscovered why past-me made that comment ;)
I will create a prototype
@snowleopard I finally found some time to start working on this and I'm wondering about operations that combine graphs, e.g. overlay
.
If we define value :: Int -> a
and index :: a -> Maybe Int
on graph itself, there could be two graphs with different mappings, right?
I guess we could combine index
functions but I don't know what to do with value
which seems to be total.
This is of course if I'm understanding the issue correctly which is to check if this representation of AdjacencyMap
could improve anything and if so, rewrite API to forward to AdjacencyIntMap
there could be two graphs with different mappings, right?
Yes.
I guess we could combine
index
functions but I don't know what to do withvalue
which seems to be total.
Actually, I'm not entirely sure my draft above is correct. What we really need to represent here is an isomorphism between vertices a
that appear in the graph and their integer counterparts Int
. Do you have an idea of how we could do this both efficiently and in a compositional way to permit operations like overlay
?
This is of course if I'm understanding the issue correctly which is to check if this representation of
AdjacencyMap
could improve anything and if so, rewrite API to forward toAdjacencyIntMap
That's right. We know that AdjacencyIntMap
is fast and we'd like to benefit from its efficiency when working with more complex vertex types a
.
That's right. We know that
AdjacencyIntMap
is fast and we'd like to benefit from its efficiency when working with more complex vertex typesa
.
Re-reading what I wrote, I think this goal sounds a bit naive: if it were possible, then we could similarly implement Map k a
on top of IntMap a
:-)
Do you have an idea of how we could do this both efficiently and in a compositional way to permit operations like overlay?
No clue but I will try to figure something out.
Do you have anything else that I could help with in the meantime?
Brainstorming a few ideas:
- Redo the indices for one side:
- For each mapping therein:
- If it coincides with the mapping on the other side, keep it.
- If it maps to an index not used on the other side, keep it.
- Othewise map to a fresh integer.
- Complexity seems likely terrible.
- That might be okay for some workflows if we retain the nice querying properties of
AdjacencyIntMap
.
- That might be okay for some workflows if we retain the nice querying properties of
- For each mapping therein:
- Forcibly ensure disjointness.
- Partition the mappings into common, left only, and right only.
- Multiply all the indices by 10 and add 1, 2, or 3, respectively if it is common, left, or right.
- I'm sure there is a much simpler way to do this, but this is the stupidest thing I could think of.
- You're going to rapidly end up with some big integers.
- This is essentially using integers as a crappy trie - could we actually write a version that used some kind of tree as a key?
- For hashable values, use their hash as the index.
- Needs some fallback in case of collisions, unclear what it should be.
@michaelpj Thanks for the ideas! Most likely, we don't really need to make operations like overlay
and connect
really fast, because constructing large adjacency maps using these operations is going to be terribly slow anyway. I think it's worth keeping a "dense" mapping to Int
s so that we don't run out of them in practice and could rely on IntSet
s and IntMap
s internally.
Do you have anything else that I could help with in the meantime?
@Avasil I think it would be great to prototype the most straightforward implementation of this idea that you can come up with and see how well it performs in benchmarks.
(As a completely separate issue, you might look into #90 which came up in your recent PR #222.)
Yes, I agree that querying performance is usually much more relevant. So the relabeling approach might be fine in practice.
@snowleopard I will try validating @michaelpj ideas first, maybe it can lead us somewhere. :D
Redo the indices for one side:
For each mapping therein:
- If it coincides with the mapping on the other side, keep it.
- If it maps to an index not used on the other side, keep it.
- Othewise map to a fresh integer.
Complexity seems likely terrible.
It's not clear to me how to modify a value :: Int -> a
function. My first attempt was going through vertices of one of the graphs (let's say G2), applying value
of both graphs and checking if the result is different. If it is different, I call index G1
and see if it's Nothing
. If it's not, I don't know how to find new integer. Should I go through all vertices of G1
and choose different one?
It feels like each overlay
and connect
would add extra "layer" on top of these functions which would be expensive anytime we call them, even during queries. Hopefully I'm misunderstanding or not familiar with a technique to achieve this in different way
@Avasil Why don't you start with something naive but very simple: just renumber everything (always keeping the indices starting at 0) whenever you do something like overlay
or connect
. Then find the resulting complexity and see where we can go from there. It's always best to start with something that works and then iterate & optimise!
@snowleopard I managed to implement and benchmark few functions.
I plan to add tests now so I can get rid of bugs and then keep trying to optimize. Though any suggestions would be very helpful, currently it's very slow
I changed representation a bit:
data AdjacencyMapPoc a = AM
{ graph :: AIM.AdjacencyIntMap
, valueMap :: IntMap a
, indexMap :: Map a Int } deriving (Generic)
value :: AdjacencyMapPoc a -> Int -> Maybe a
value g i = IntMap.lookup i (valueMap g)
index :: Ord a => AdjacencyMapPoc a -> a -> Maybe Int
index g a = Map.lookup a (indexMap g)
I didn't know how to create value :: Int -> a
so I was using value :: Int -> Maybe a
.
For overlay
/ connect
I keep one side and then fold vertices of the other side and map them to new indices which I add to the old mappings. I don't know how to update functions without Map
Comparison to AdjacencyMap: (with a = Int)
addEdge: 26.16 (bad)
addVertex: 17.67 (bad)
creation: 3.28 (bad)
edgeCount: 2.93 (bad)
edgeList: 4.59 (bad)
hasEdge: 1.48 (bad)
hasVertex: 1.27 (bad)
isEmpty: 1.08 (OK)
vertexCount: 33.87 (bad)
vertexList: 3.57 (bad)
Here's my branch https://github.com/Avasil/alga/blob/adjMap-alternative-impl/src/Algebra/Graph/AdjacencyMapPoc.hs Although the code is very messy right now :D
I added implementations for removeVertex
, removeEdge
and improved hasVertex
:
hasVertex: 1.04 (OK)
removeEdge: 0.80 (good)
removeVertex: 0.55 (good)
I don't really understand why removing is so fast and vertexCount
so terribly slow.
BTW which functions are the most interesting to us in this experiment? I could focus on implementing those next
@Avasil Thanks for the experiment! I don't have time to look at your implementation in detail at the moment, but it's indeed strange that vertexCount
is so slow, e.g. there is a 10x difference compared to vertexList
which should be at least as hard! It would be nice to bring all overheads down to 2-3x.
BTW which functions are the most interesting to us in this experiment?
I think your current selection is great. Perhaps you could also add preSet
and postSet
, since they are often needed in algorithms.
I think the issue with vertexCount
was caused by a bug I've had in overlay
, latest results are:
addEdge: 27.98 (bad)
addVertex: 20.13 (bad)
creation: 3.36 (bad)
edgeCount: 3.31 (bad)
edgeList: 4.51 (bad)
equality: 14.01 (bad)
hasEdge: 6.67 (bad)
hasVertex: 0.98 (OK)
isEmpty: 1.04 (OK)
removeEdge: 0.91 (OK)
removeVertex: 0.61 (good)
vertexCount: 1.04 (OK)
vertexList: 0.93 (OK)
equality
is quite bad but I started with simple implementation:
instance Ord a => Eq (AdjacencyMapPoc a) where
(==) x y = and
[ (==) (vertexCount x) (vertexCount y)
, (==) (vertexSet x) (vertexSet y)
, (==) (edgeCount x) (edgeCount y)
, (==) (edgeSet x) (edgeSet y) ]
which doesn't leverage AdjacencyIntMap
so it's expected.
I also tried to compare underlying AdjacencyIntMap
but the problem is that I need to convert values to the same mappings. My first try (gmap
over one AIM
) was 1000x slower than AdjacencyMap equality. :D
@Avasil Is there a chance to bring the overhead in addVertex
and addEdge
down to 3-5x?
@snowleopard Current implementation of overlay
and connect
is quite involved so I assume it is possible to simplify it https://github.com/Avasil/alga/blob/adjMap-alternative-impl/src/Algebra/Graph/AdjacencyMapPoc.hs#L282
I haven't looked into it yet