alga Explore alternative implementations for AdjacencyMap

We currently store vertices of type a directly in an AdjacencyMap a, which may lead to poor performance if the corresponding Ord a instance is slow. One example, which motivated me to record this issue, is the SCC algorithm developed in #128 that produces AdjacencyMap (NonEmpty.AdjacencyMap a), i.e. graphs whose vertices are graphs (and graphs are expensive to compare, see #126).

An alternative would be to define something like

data AdjacencyMap a = AdjacencyMap
    { graph :: AdjacencyIntMap
    , value :: Int -> a
    , index :: a -> Maybe Int }

This is similar to Data.Graph.graphFromEdges:

http://hackage.haskell.org/package/containers-0.6.0.1/docs/Data-Graph.html#v:graphFromEdges

In this way, we can benefit from the high performance of the underlying AdjacencyIntMap yet provide a typed high-level interface AdjacencyMap a to the user.

Nov 15 '18 12:11 snowleopard

Perhaps more generally, it would be nice to be able to provide a bijective map between the "real" node type and a "key" type which is cheaper to store, or has a cheaper (or existent) Ord instance.

Nov 30 '18 17:11 michaelpj

Note that this can be a blocker for moving from Data.Graph, which allows the alternative key type. I think I just rediscovered why past-me made that comment ;)

May 20 '19 10:05 michaelpj

I will create a prototype

Jul 20 '19 23:07 Avasil

@snowleopard I finally found some time to start working on this and I'm wondering about operations that combine graphs, e.g. overlay.

If we define value :: Int -> a and index :: a -> Maybe Int on graph itself, there could be two graphs with different mappings, right?

I guess we could combine index functions but I don't know what to do with value which seems to be total.

This is of course if I'm understanding the issue correctly which is to check if this representation of AdjacencyMap could improve anything and if so, rewrite API to forward to AdjacencyIntMap

Aug 07 '19 21:08 Avasil

there could be two graphs with different mappings, right?

Yes.

I guess we could combine index functions but I don't know what to do with value which seems to be total.

Actually, I'm not entirely sure my draft above is correct. What we really need to represent here is an isomorphism between vertices a that appear in the graph and their integer counterparts Int. Do you have an idea of how we could do this both efficiently and in a compositional way to permit operations like overlay?

This is of course if I'm understanding the issue correctly which is to check if this representation of AdjacencyMap could improve anything and if so, rewrite API to forward to AdjacencyIntMap

That's right. We know that AdjacencyIntMap is fast and we'd like to benefit from its efficiency when working with more complex vertex types a.

Aug 07 '19 22:08 snowleopard

That's right. We know that AdjacencyIntMap is fast and we'd like to benefit from its efficiency when working with more complex vertex types a.

Re-reading what I wrote, I think this goal sounds a bit naive: if it were possible, then we could similarly implement Map k a on top of IntMap a :-)

Aug 07 '19 22:08 snowleopard

Do you have an idea of how we could do this both efficiently and in a compositional way to permit operations like overlay?

No clue but I will try to figure something out.

Do you have anything else that I could help with in the meantime?

Aug 08 '19 08:08 Avasil

Brainstorming a few ideas:

Redo the indices for one side:
- For each mapping therein:
  - If it coincides with the mapping on the other side, keep it.
  - If it maps to an index not used on the other side, keep it.
  - Othewise map to a fresh integer.
- Complexity seems likely terrible.
  - That might be okay for some workflows if we retain the nice querying properties of AdjacencyIntMap.
Forcibly ensure disjointness.
- Partition the mappings into common, left only, and right only.
- Multiply all the indices by 10 and add 1, 2, or 3, respectively if it is common, left, or right.
  - I'm sure there is a much simpler way to do this, but this is the stupidest thing I could think of.
- You're going to rapidly end up with some big integers.
  - This is essentially using integers as a crappy trie - could we actually write a version that used some kind of tree as a key?
For hashable values, use their hash as the index.
- Needs some fallback in case of collisions, unclear what it should be.

Aug 08 '19 09:08 michaelpj

@michaelpj Thanks for the ideas! Most likely, we don't really need to make operations like overlay and connect really fast, because constructing large adjacency maps using these operations is going to be terribly slow anyway. I think it's worth keeping a "dense" mapping to Ints so that we don't run out of them in practice and could rely on IntSets and IntMaps internally.

Do you have anything else that I could help with in the meantime?

@Avasil I think it would be great to prototype the most straightforward implementation of this idea that you can come up with and see how well it performs in benchmarks.

(As a completely separate issue, you might look into #90 which came up in your recent PR #222.)

Aug 08 '19 09:08 snowleopard

Yes, I agree that querying performance is usually much more relevant. So the relabeling approach might be fine in practice.

Aug 08 '19 09:08 michaelpj

@snowleopard I will try validating @michaelpj ideas first, maybe it can lead us somewhere. :D

Redo the indices for one side:

For each mapping therein:

If it coincides with the mapping on the other side, keep it.

If it maps to an index not used on the other side, keep it.

Othewise map to a fresh integer.

Complexity seems likely terrible.

It's not clear to me how to modify a value :: Int -> a function. My first attempt was going through vertices of one of the graphs (let's say G2), applying value of both graphs and checking if the result is different. If it is different, I call index G1 and see if it's Nothing. If it's not, I don't know how to find new integer. Should I go through all vertices of G1 and choose different one?

It feels like each overlay and connect would add extra "layer" on top of these functions which would be expensive anytime we call them, even during queries. Hopefully I'm misunderstanding or not familiar with a technique to achieve this in different way

Aug 10 '19 21:08 Avasil

@Avasil Why don't you start with something naive but very simple: just renumber everything (always keeping the indices starting at 0) whenever you do something like overlay or connect. Then find the resulting complexity and see where we can go from there. It's always best to start with something that works and then iterate & optimise!

Aug 10 '19 23:08 snowleopard

@snowleopard I managed to implement and benchmark few functions.

I plan to add tests now so I can get rid of bugs and then keep trying to optimize. Though any suggestions would be very helpful, currently it's very slow

I changed representation a bit:

data AdjacencyMapPoc a = AM
    { graph :: AIM.AdjacencyIntMap
    , valueMap :: IntMap a
    , indexMap :: Map a Int } deriving (Generic)

value :: AdjacencyMapPoc a -> Int -> Maybe a
value g i = IntMap.lookup i (valueMap g)

index :: Ord a => AdjacencyMapPoc a -> a -> Maybe Int
index g a = Map.lookup a (indexMap g)

I didn't know how to create value :: Int -> a so I was using value :: Int -> Maybe a. For overlay / connect I keep one side and then fold vertices of the other side and map them to new indices which I add to the old mappings. I don't know how to update functions without Map

Comparison to AdjacencyMap: (with a = Int)

addEdge: 26.16 (bad)
addVertex: 17.67 (bad)
creation: 3.28 (bad)
edgeCount: 2.93 (bad)
edgeList: 4.59 (bad)
hasEdge: 1.48 (bad)
hasVertex: 1.27 (bad)
isEmpty: 1.08 (OK)
vertexCount: 33.87 (bad)
vertexList: 3.57 (bad)

Here's my branch https://github.com/Avasil/alga/blob/adjMap-alternative-impl/src/Algebra/Graph/AdjacencyMapPoc.hs Although the code is very messy right now :D

Aug 15 '19 18:08 Avasil

I added implementations for removeVertex, removeEdge and improved hasVertex:

hasVertex: 1.04 (OK)
removeEdge: 0.80 (good)
removeVertex: 0.55 (good)

I don't really understand why removing is so fast and vertexCount so terribly slow.

BTW which functions are the most interesting to us in this experiment? I could focus on implementing those next

Aug 16 '19 20:08 Avasil

@Avasil Thanks for the experiment! I don't have time to look at your implementation in detail at the moment, but it's indeed strange that vertexCount is so slow, e.g. there is a 10x difference compared to vertexList which should be at least as hard! It would be nice to bring all overheads down to 2-3x.

BTW which functions are the most interesting to us in this experiment?

I think your current selection is great. Perhaps you could also add preSet and postSet, since they are often needed in algorithms.

Aug 16 '19 20:08 snowleopard

I think the issue with vertexCount was caused by a bug I've had in overlay, latest results are:

addEdge: 27.98 (bad)
addVertex: 20.13 (bad)
creation: 3.36 (bad)
edgeCount: 3.31 (bad)
edgeList: 4.51 (bad)
equality: 14.01 (bad)
hasEdge: 6.67 (bad)
hasVertex: 0.98 (OK)
isEmpty: 1.04 (OK)
removeEdge: 0.91 (OK)
removeVertex: 0.61 (good)
vertexCount: 1.04 (OK)
vertexList: 0.93 (OK)

equality is quite bad but I started with simple implementation:

instance Ord a => Eq (AdjacencyMapPoc a) where
    (==) x y = and
        [ (==) (vertexCount x) (vertexCount  y)
        , (==) (vertexSet   x) (vertexSet    y)
        , (==) (edgeCount   x) (edgeCount    y)
        , (==) (edgeSet     x) (edgeSet      y) ]

which doesn't leverage AdjacencyIntMap so it's expected.

I also tried to compare underlying AdjacencyIntMap but the problem is that I need to convert values to the same mappings. My first try (gmap over one AIM) was 1000x slower than AdjacencyMap equality. :D

Aug 18 '19 17:08 Avasil

@Avasil Is there a chance to bring the overhead in addVertex and addEdge down to 3-5x?

Sep 10 '19 21:09 snowleopard

@snowleopard Current implementation of overlay and connect is quite involved so I assume it is possible to simplify it https://github.com/Avasil/alga/blob/adjMap-alternative-impl/src/Algebra/Graph/AdjacencyMapPoc.hs#L282

I haven't looked into it yet

Sep 10 '19 21:09 Avasil

alga alga copied to clipboard

Explore alternative implementations for AdjacencyMap

alga
alga copied to clipboard