alga Rename Graph type class

Rename Graph type class

Open phadej opened this issue 6 years ago • 17 comments

It's quite inconvenient to use class and data at the same time.

As data Graph directly encodes methods of the class, i'd name it Syntax, Grammar or Alg, but those might be quite weird names to use.

Nov 11 '17 12:11 phadej

@phadej I agree that it's very inconvenient, so thanks for raising this issue!

My plan was to eventually change the name of the type class, because the algebra works not only for graphs, but, for example, equivalence relations (combination of Reflexive, Undirected and Transitive subclasses), and all sorts of other strange (not exactly graph-like) instances.

On the other hand, a concrete data type Graph has a fixed Eq instance which currently corresponds exactly to (directed) graphs.

Having data Graph also seems to correspond nicely with standard libraries where we have build-in lists, data Tree, data Forest and other concrete data types. Naming this data type Expr (as I had before) or Syntax (as you suggest) feels a bit awkward...

Nov 11 '17 14:11 snowleopard

I see, renaming the class will work for me too.

Nov 11 '17 14:11 phadej

In a recent discussion with colleagues, Plex came up as a possible name for the type class (there seem to be a nice correspondence between the algebra and topological concepts like simplexes and complexes). I will write a blog post about this soon.

If you (or anyone else) have any suggestions for the name of the type class, please let me know!

Nov 11 '17 14:11 snowleopard

A simple and boring option is IsGraph (which of course doesn't address the "what if it isn't" issue from @snowleopard's previous comments).

Nov 11 '17 15:11 glaebhoerl

@glaebhoerl Yes, something like IsGraph is an option, but I believe we are looking for an abstract term.

As an example, consider rings as a common abstraction for numbers. Then instance Ring Integer and instance Ring Matrix are better than instance IsInteger Integer and instance IsInteger Matrix.

In the same spirit I want to be able to define instance X Graph and instance X EquivalenceRelation for some X. But I fear no standard X exists which is abstract enough so we need to come up with a new term, such as, for example Plex.

Nov 12 '17 01:11 snowleopard

One note: as there's Graph class, there could be NonEmptyGraph, (think Monoid and Semigroup).

one could have e.g.

vertices1 :: NonEmptyGraph g => NonEmpty (Vertex g) -> g
vertices1 = foldr1 overlay . map vertex

In the paper 3.1 section could have

(G,+) is a idempotent commutative semigroup
(G,->) is a semigroup
-> distributes over +

I'm quickly skimming thru the paper, and empty isn't used that much there!

This just a thought, not that we should have such class in algebraic-graphs, but maybe it will guide the naming debate.

Nov 13 '17 09:11 phadej

@phadej Indeed! I've set up a separate issue #31 to discuss non-empty graphs.

I'm quickly skimming thru the paper, and empty isn't used that much there!

You are right, non-empty graphs are quite interesting and useful by themselves -- you can do a lot of things without empty. There are also many situations when we do want empty graphs, so I think Alga should provide both (just like with lists).

I think whatever X we choose as the name of the type class, we'll need to have both class NonEmptyX g and class NonEmptyX g => X g, with the latter having just a single method empty. Does this sound right?

Nov 13 '17 19:11 snowleopard

Another direction is to come up with a better name for the instances instead of renaming the classes, like AlgGraph, or Ind(uctive)Graph, or even FreeGraph.

I like this better as this at least says something about the used representation. IMHO the analogy with Tree, Map is a bit of a red herring as they are not backed by a corresponding type class.

Dec 28 '17 15:12 boggle

@boggle I like FreeGraph.

IMHO the analogy with Tree, Map is a bit of a red herring as they are not backed by a corresponding type class.

Perhaps, we don't have a type class for lists, trees, etc. because free ADTs are perfect for most use-cases? I'm starting to doubt whether type classes bring us a lot of benefit, since in Haskell it's always easy to reinterpret an expression in ADT such as data Graph however you like without necessarily involving type classes. We can treat data Graph as the core language for graphs and one can always extract an adjacency list or any other graph representation from it. I think it doesn't cost us anything in terms of performance.

Any generic function that we can write for a polymorphic Graph g => g can be written much easier for data Graph, since we can use pattern-matching without sophisticated tricks like Boehm-Berarducci encoding using newtype Fold a (as in Algebra.Graph.Fold).

So, if class Graph is potentially much less important than data Graph, it can surrender the name Graph.

@ndmitchell was saying this to me a year ago, but I was too excited about polymorphic graphs at that time and couldn't hear him :)

Dec 28 '17 23:12 snowleopard

Happy new year! We need to watch out that we are not trying to solve a deeper problem here (cf. http://degoes.net/articles/kill-data). Type classes are what haskell provides for abstracting over different implementations. I also agree that data Graph has some primacy over other implementations. The general direction that starts to emerge here of having data Graph and a class for values that hold representations of graphs (class IsGraph) seems sound.

Jan 27 '18 12:01 boggle

@boggle Happy new year! I've actually read the blog post you linked to, right at the time of my excitement with polymorphic graphs and the finally tagless pattern in general, but thanks for reminding me about it. Some of my current hesitation about the need for the type class is actually well articulated in the comments, for example, this one:

You can trivially convert the Either output into whatever form you need. If your compiler is any good, the conversion will be invisible in the produced code: the Either will never actually be produced at runtime, and the additional boilerplate when writing your code is a minimal one-off, too.

Let's look at one of the current instances of class Graph, for example AdjacencyMap. Is it a good instance? Yes and no. On the one hand, it does satisfy the laws, and allows you to seemlessly reuse polymorphic code when constructing graphs. However, if you need to construct a large AdjacencyMap you probably wouldn't want to actually evaluate a polymorphic expression, because that would cause a lot of unnecessary work. Imagine constructing clique [1..10000] in this manner: this will perform a lot of unnecessary intermediate computation in order to yield a fairly trivial AdjacencyMap in the end. A more efficient approach would be to have a function toAdjacencyMap :: Graph a -> AdjacencyMap a that avoids unnecessary computation and performs an optimised compilation of the given graph expression. You could in fact put this in a type class class FromGraph f where fromGraph :: Graph a -> f a or something similar if you'd like to avoid creating functions like toAdjacencyMap for each possible graph representation.

Do you have a specific use case for polymorphic graphs? Let's look into it, taking the above performance cosiderations into account.

Feb 01 '18 19:02 snowleopard

No, not per se. I was more reacting to what you wrote above and make sure we don't try to fix a problem that's more related to haskell than to algra. I can imagine though someone wanting to write graph representation agnostic code (i.e. for plain data import or using a data structure that bundles and arbitrary graph with some metadata) but perhaps an auxiliary FromToGraph type class should be more than enough for that.

Feb 01 '18 20:02 boggle

A more efficient approach would be to have a function toAdjacencyMap :: Graph a -> AdjacencyMap a that avoids unnecessary computation and performs an optimised compilation of the given graph expression.

This sounds like a really good idea and could even allow other instances based on immutable arrays that would be too expensive at the moment. Do you have some new thoughts about this?

This might be important for an implementation of #17

Apr 18 '18 15:04 anfelor

This sounds like a really good idea and could even allow other instances based on immutable arrays that would be too expensive at the moment. Do you have some new thoughts about this?

@anfelor I'd love to have an efficient array-based implementation using this idea, but I haven't had time to explore it yet. I think we should aim at implementing a function that takes a Graph a and a mapping from vertices to indices a -> Int and produces something like an AdjacencyArray in O(|V| + |E|) time:

toGraphArray :: Graph a -> (a -> Int) -> AdjacencyArray

By the way, here is my old attempt at implementing an array-based Graph instance:

https://github.com/snowleopard/alga/blob/febf7a4b8ce9d5059ca1ff5ba27de795c53b673f/src/Algebra/Graph/AdjacencyArray/Unboxed.hs

It worked, but was extremely slow, of course :) But we might still reuse some code.

Apr 18 '18 22:04 snowleopard

This instance doesn't allow to share common subgraphs and is therefore typically very slow.

I don't see why this is a problem, isn't not sharing subgraphs the default for basically every graph library? I would guess the reason why it is so slow is because overlay and connect are O(n + m), making fromEdgeList O((n + m)^2) (where n=|V| and m=|E|). Shouldn't a toGraphArray function avoid this?

a mapping from vertices to indices a -> Int a

Is Int necessary? Maybe Unbox a from the vector package would be more flexible?

By the way, here is my old attempt at implementing an array-based Graph instance:

Thanks for sharing! If you want I can try to update it. My toGraphArray would traverse the graph twice; once to put all the vertices into a set and again to populate a array of the right size, in total O(n + m).

Apr 19 '18 14:04 anfelor

I don't see why this is a problem, isn't not sharing subgraphs the default for basically every graph library?

@anfelor Take the AdjacencyMap instance: you have a lot of sharing between various subgraphs within the same expression. For example, clique [1..100] will likely store sets {100}, {99,100}, {98,99,100} only once in memory. Without this, AdjacencyMap would also be very slow and likely unusable. GraphArray has no way to achieve that: it has to allocate memory for every intermediate graph during the computation of the final graph.

Is Int necessary? Maybe Unbox a from the vector package would be more flexible?

This may be possible, but I guess you'll then also need Ord a and you'll have to store the set of all vertices in some kind of Set incurring the logarithmic overhead. On the other hand, if the a -> Int function compresses all vertices into a tight integer interval, you'll likely be able to compile graphs in linear time and memory. But having support for Unbox a would be nice, of course.

If you want I can try to update it. My toGraphArray would traverse the graph twice; once to put all the vertices into a set and again to populate a array of the right size, in total O(n + m).

That would be great! Making it O(n + m) would probably be a bit tricky. Are you sure you are not actually talking about something like O(n * log(n) + m) instead?

By the way, let's use the name AdjacencyArray instead of GraphArray. I think it's more consistent.

Apr 19 '18 15:04 snowleopard

Take the AdjacencyMap instance: you have a lot of sharing between various subgraphs within the same expression.

Ah, sharing while constructing the graph and not in the fully evaluated graph. Of course, the old implementation doesn't do that.

Are you sure you are not actually talking about something like O(n * log(n) + m) instead

No, I am not :).

In the pull request #62 I have extracted the GraphKL field from AdjacencyMap into its own module. Do you think that would be sufficient as a array based implementation, or do we need another one based on matrices (as opposed to an array refering to lists)?

Apr 20 '18 15:04 anfelor

alga alga copied to clipboard

Rename Graph type class

alga
alga copied to clipboard