alga
alga copied to clipboard
Algorithms should be in terms of typeclasses not data structures
It's a bit of a shame that things like bfs
and dfs
are duplicated for every single AdjacencyMap
representation. For example Algebra.Graph.Labelled.AdjacencyMap
does not even have a DFS at the moment. They should be rewritten and unified in terms of ToGraph
instead.
Of course not all the instances of ToGraph
have efficient implementations of things like postSet
, but that could also be improved, by splitting the class up and having Graph
and AdjacencyMap
implement different subsets of it, with utilities functions called, e.g. postSetSlow
that provide a shim to the full set of methods. It seems like nearly all of the typical traditional graph algorithms need an efficient postSet
method for traversing the graph, whereas very few of them need an efficient connect
method.
To clarify, although ToGraph
contains class methods dfs
etc, the implementations are separate, though some re-use others after converting the graph into a different representation. My point is that with a single typeclass-based implementation of dfs
none of this boilerplate would be necessary, including the conversions between representations.
Algorithms are generally defined in terms of the operations you can perform on the graph, not the underlying representation, so it would be a simpler design to put these basic operations in the typeclass (as is already done) but leave the algorithm out of the typeclass (as is not done, resulting in duplicate definitions and boilerplate). If an efficient version of those operations are not available, then the representation should not implement that typeclass, and then the user knows clearly that they must perform an explicit conversion into a different representation in order to run the algorithm.
@infinity0 I found that trying to squeeze graph algorithms into type classes falls apart pretty quickly, but I'm open for experiments. As an example, I had to ditch type classes from the testsuite because it was just horrible, replacing them with a somewhat less horrible data type:
https://github.com/snowleopard/alga/blob/b9714bb2fc963189e0d338b6942a68aed37c132f/test/Algebra/Graph/Test/API.hs#L61-L144
Can you propose a concrete design, porting existing implementations of dfs
, bfs
, topSort
and scc
into a type class based implementation? To me, this seems like a pretty big, difficult and not very fun task, with a high risk of failure and unclear benefits. That's why I personally do not invest time into it. But if this is something you are interesting in exploring, I'm happy to advise/give feedback.
Thanks. I'll have a go at some point - my main motivation was to implement a max-flow-min-cost algorithm, which I could do in terms of ToGraph
but then it looks out-of-place because everything else is implemented as a class-method rather than a top-level method with a class-constraint. For example fgl implements its algorithms in terms of their Graph typeclass including "more fundamental" ones like bfs/dfs.
AFAICT, and correct me if I'm wrong, but there are effectively two kinds of representations here - one based on adjacency maps, and one based on your algebraic graphs concept; however the "traditional" traversal-based algorithms tend to work better on adjacency maps. fgl uses essentially an adjacency-map representation (1, 2) which makes it fairly straightforward to implement their match
function, which is part of their inductive graph API, and all their algorithms are implemented in terms of that.
I actually think that (at least conceptually) both these libraries could be merged together, and then we could have functions converting between algebraic vs inductive APIs, with concrete representations implementing only one of the APIs. Certain algorithms inherently work better on one or the other kind, and should only assume one of the constraints. Users that have a representation not implementing a constraint, must then explicitly call a conversion function, to convert it into another representation with the right constraint.
One design aspect mentioned in the paper is that you thought fgl
's partial functions were not great, which I agree, but I think that could be improved straightforwardly by making them either check for the precondition, or give a no-op, or be idempotent, or something along those lines - just like how Overlay
is a no-op for certain cases rather than failing.
Thanks!
AFAICT, and correct me if I'm wrong, but there are effectively two kinds of representations here - one based on adjacency maps, and one based on your algebraic graphs concept; however the "traditional" traversal-based algorithms tend to work better on adjacency maps.
Yes. However, by combining the algebraic representation with sparsify
and classic algorithms you can process some dense graphs very efficiently, in the time and memory proportional to the size of their algebraic representation, which can be quadratically smaller than the number of edges.
As of now, there are no algorithms for solving even basic problems like DFS/BFS directly on the algebraic representation, but I'm (slowly) working towards that goal.
One suggestion I have is to not build on top of ToGraph
because it has some methods related to the algebraic representation: "Graph" in the name stands for the Graph
data type from Algebra.Graph
. Perhaps, you want something like Data.Graph.Traversal
where you will provide an interface with a few primitives that are required for classic graph algorithms. It will have nothing to do with algebraic graphs and will therefore live in the Data.Graph
namespace.
When comparing to fgl
you should keep in mind that the type of vertices is fixed to Int
internally, which simplifies some things. In this library, my goal was to provide a parametric interface, which makes it difficult to reconcile parametric representations from non-parametric like AdjacencyIntMap
. I hope you'll manage to come up with an interface that can support efficient algorithms for both.
Has anyone thought more about this? I find this issue quite interesting.
I'm not aware of any progress in the context of this library.
I remain a little sceptical of the goal of writing graph algorithms against an abstract interface because implementations of this interface differ dramatically in terms of the complexity of individual methods.
Having said that, I'm interested in seeing a proof of concept!
I will be giving this some more thought. I am hoping that there is some sort of Haskell graph algorithms library similar to what LEDA offers. Maybe an abstract interface that comes with implementation hints/constraints (e.g. O(1) lookup for certain things) could be explored.
@dataopt Are you saying that LEDA implements graph algorithms against an abstract graph interface? If yes, could you point to an example?
I haven't expressed myself clearly. LEDA has its own graph representation. What I meant to say was to be able to have an interface for the graph algorithms that LEDA offers against an abstract graph interface in Haskell. (Not sure if that's even possible.) In the meantime, I came across this old Haskell wiki article that I have not seen before. I'm also trying to decipher this master's thesis which mentions Alga
.
Got it. Thanks for the links, they are new to me too!
(Oops, actually, I did come across the master thesis before but never had a chance to read it in more detail. Will try this time!)
I could be missing some important nuances but I am just thinking out loud in the following.
(Aside: I prefer to use the definitions of graphs and directed graphs in Bondy and Murty where a directed graph is a triple (V, E, f) where f is a function that maps each element of E (i.e. an edge) to an ordered pair (u,w) with u and w being elements of V. This definition permits parallel edges which could be helpful in certain situations.)
It seems to me that a reasonably comprehensive interface for graphs must provide at the minimum support for vertex and edge labels and the ability to add/remove vertices and edges as well as query incident edges given a vertex. With these, all sorts of graph algorithms can be implemented. Whether such implementations will be efficient is of course the million-dollar question.
As for efficiency, I am wondering if some sort of state (like a StatefulGraph
type class?) can be used for memoizing expensive operations. For example, if the underlying graph representation doesn't use an adjacency list so that querying incident edges is an expensive operation, the stateful graph could "cache" the result of such a query. This approach saves one from converting between representations up front. Clearly, there are a lot of technical challenges to overcome with this approach.
Looks like there is a library of graph algorithms that actually don't need a concrete definition of a graph: https://hackage.haskell.org/package/search-algorithms-0.3.2/docs/Algorithm-Search.html
@dataopt This all makes sense to me. I agree that it's interesting to do some experiments in this space. (If only I had time myself!)
I do like the idea behind the search-algorithms
library but my understanding is that it doesn't do the caching tricks you described, so one essentially needs to use an efficient data structure to implement various functional queries. And I don't think there is any hope to make that library work efficiently with the algebraic graph representation, for instance, because the cost model is just too different.
@snowleopard BTW, is there any connection between your work on algebraic graphs and GraphBLAS?
I don't think there is any connection.