alga icon indicating copy to clipboard operation
alga copied to clipboard

Make Graph datatype abstract, remove awkward instances

Open snowleopard opened this issue 6 years ago • 14 comments

We currently expose the implementation of the Graph datatype:

data Graph a = Empty
             | Vertex a
             | Overlay (Graph a) (Graph a)
             | Connect (Graph a) (Graph a)
             deriving (Foldable, Functor, Show, Traversable)

This is problematic:

  • Users can observe different graph structures even though underlying graphs are equal according to the laws of the algebra, e.g. Empty == Overlay Empty Empty.

  • Other representations may be more efficient (e.g. see #90) but we can't experiment with them without potentially breaking user code.

While doing this, we should probably dump awkward instance such as Foldable, since one can also use it to reveal internal structure, e.g. toList [1] == [1] but toList [1 + 1] == [1, 1]. We have an equivalent of the Foldable type class tailored for graphs, which we call ToGraph, which relies on the user to provide correct folding functions to avoid such issues.

snowleopard avatar Jul 18 '18 10:07 snowleopard

So I think the right way to go is using pattern synonyms.

It allows to "hide" the real constructor, but still allows pattern matching (even if it can be really bad in performances). For examples,

  • that is what I used while experimenting #90 , see https://github.com/nobrakal/alga/blob/6569e64e206654d0308c475017de4ff9d3bccebe/src/Algebra/Graph.hs#L166

  • I also used PatternSynonyms for a more complex case (with an explicit bi-directional pattern using ViewPatterns) https://github.com/nobrakal/alga/blob/5cc3f7d872084305684e208405e3e1c88bc6e425/src/Algebra/Graph.hs#L174

So I think it can cover almost all possible representations :)

While doing this, we should probably dump awkward instance such as Foldable

I agree that this instance is awkward, but sadly, for example, vertexList require an Ord instance on the vertices where toList does not require any. Also losing Fodable implies losing Traversable which can be useful to someone. Maybe we can move these instances in another module, so they do not get imported with Algebra.Graph and someone with the need of such instances can have them, but he will have to do another import, with a lot of warnings in the documentation.

nobrakal avatar Jul 18 '18 12:07 nobrakal

So I think the right way to go is using pattern synonyms.

The whole point of hiding constructors is to avoid leaking any information about the structure of the graph: by providing pattern synonyms you will allow the user to distinguish x vs x + x unless you plan to always keep graphs in some normal form (which may be expensive). So I think we don't need pattern synonyms -- we'll just hide the constructors, forever :)

vertexList require an Ord instance on the vertices where toList does not require any

That's true, but I don't think it is possible to implement vertexList without Ord and without revealing the structure of the graph.

Note that we could add unorderedVertexList for internal use which would be equivalent to toList:

unorderedVertexList :: Graph a -> [a]
unorderedVertexList = foldg [] pure (++) (++)

Also losing Foldable implies losing Traversable which can be useful to someone.

Are you sure they are genuinely useful? My impression is that pretty much any use of them is a bug waiting to happen.

Maybe we can move these instances in another module

This will lead to orphan instances, I would prefer to avoid that.

snowleopard avatar Jul 18 '18 13:07 snowleopard

... we'll just hide the constructors, forever :)

Ah I forgot about impotence ! But you are right, goodbye poor little constructors ^^

Note that we could add unorderedVertexList for internal use which would be equivalent to toList...

Yes, that is true.

...My impression is that pretty much any use of them is a bug waiting to happen.

Certainly ^^. Moreover, everything is definable for anyone using foldg, right ? So one can define an orphan instance of Foldable and Traversable if it really need it.

For sure, all of this will need some explanations somewhere !

Note that we will also have to change Algebra.Graph.NonEmpty for coherence, I think.

nobrakal avatar Jul 18 '18 18:07 nobrakal

Moreover, everything is definable for anyone using foldg, right ?

I think everything from the Foldable class is implementable using foldg. But perhaps we could also add a safe equivalent of traverse?

Note that we will also have to change Algebra.Graph.NonEmpty for coherence, I think.

Yes, you are right.

snowleopard avatar Jul 18 '18 21:07 snowleopard

But perhaps we could also add a safe equivalent of traverse?

Is

traverse f = foldg (pure Empty) (fmap Vertex . f) (liftA2 Overlay) (liftA2 Connect)

suitable ?

But if by safe you meant something that doesn't show the inner representation, I don't know if it is possible.

Once you have traverse, you can have foldMap (see foldMapDefault)

nobrakal avatar Jul 19 '18 09:07 nobrakal

@nobrakal Indeed, users will be able to implement traverse on their own using foldg as you described.

But if by safe you meant something that doesn't show the inner representation, I don't know if it is possible.

By "safe" I meant the user is supposed to follow the laws specified in the foldg method:

https://github.com/snowleopard/alga/blob/master/src/Algebra/Graph/ToGraph.hs#L54-L57

The method 'foldg' is used for generalised graph folding. It collapses a given value by applying the provided graph construction primitives. The order of arguments is: empty, vertex, overlay and connect, and it is assumed that the arguments satisfy the axioms of the graph algebra.

In case of your traverse, we have overlay = liftA2 Overlay and require that overlay x x = x (plus all other graph laws). If the user violates this requirement, it will be possible to observe the inner graph representation, which might lead to a bug, but it's not our fault -- we give a gun to users, but they should aim it at their leg and shoot by themselves :)

snowleopard avatar Jul 19 '18 14:07 snowleopard

@snowleopard So is there anything else to do here ?

About Foldable it now seems to me that with mentioning the possibility of defining such instance, and its risks somewhere, it is reasonable to remove it of the API.

nobrakal avatar Sep 20 '18 19:09 nobrakal

@nobrakal Yes, let's start by removing problematic instances. And then the next step will be to make Graph abstract. Feel free to send a PR if you like to try this!

snowleopard avatar Sep 20 '18 21:09 snowleopard

Yes, let's start by removing problematic instances.

Just realized that we will have a problem with with Algebra.Graph.HigherKinded.Class, which is based on the Foldable instance... In this module, we need:

  • null
  • elem
  • foldr
  • toList

We could require foldg to implement locally all of this, or even just a foldr function, to redefine all of these needed functions. But I am not sure it is the right way to follow :)

nobrakal avatar Sep 21 '18 07:09 nobrakal

@nobrakal Perhaps, we should just drop these functions from the API. I think all of them will be available via the ToGraph typeclass if need be.

snowleopard avatar Sep 21 '18 21:09 snowleopard

Are you sure they (Traversable instance) are genuinely useful? My impression is that pretty much any use of them is a bug waiting to happen.

I am currently modeling electrical networks using graphs from the containers package, and I use traverse everywhere! I am currently refactoring to use algebraic-graphs instead (specifically, Algebra.Graph.Labelled).

I don't understand why traverse on algebraic graphs is a bug waiting to happen. Foldable, sure, because it exposes internal structure. But isn't traversing the graph structure without changing it, inherently safe?

What is it about this definition of traverse that can lead to bugs?

LaurentRDC avatar Jan 23 '24 14:01 LaurentRDC

Actually I was bit by unexpected behavior relating to Foldable and Traversable instances for graphs.

Consider this code :

import Algebra.Graph.Labelled

graph :: Graph (Capacity Int) Char
graph = edges [(0, 'a', 'b'), (0, 'b', 'c'), (0, 'c', 'a')]

If Graph had a Foldable instance, what would expect for length graph? I would expect 3 -- there are three nodes, 'a', 'b', and 'c'. If Graph had a Traversable instance, what would expect for traverse print graph? I would expect to see 'a', 'b', and 'c' each printed once.

The previous instances for Foldable and Traversable did not follow the behavior I expected. I ended up defining a graph datatype like so instead:

type NodeID = Int

data Network e a 
    = MkNetwork { networkTopology :: Graph (Capacity e) NodeID
                , networkNodeMap  :: Bimap NodeID a
                }

where Bimap is a bijective map (unique keys AND unique values) like the one provided by this package.

Using a data structure like this -- separating the graph nodes from the connection topology -- allows you to write intuitive Foldable and traverse where nodes are visited only once.

Hopefully this can help others

LaurentRDC avatar Jan 30 '24 13:01 LaurentRDC

@LaurentRDC Yes, both Foldable and Traversable make it possible to observe the inner structure of algebraic graph expressions in a way that violates the laws of the algebra. I don't know of a good solution at the moment.

P.S. I'm curious about your application (modeling electrical networks). Where can I read more about it?

snowleopard avatar Jan 30 '24 14:01 snowleopard

P.S. I'm curious about your application (modeling electrical networks). Where can I read more about it?

This is not public work yet. Hopefully I'll be able to share details in the coming months. I'll try to remember to ping back then.

LaurentRDC avatar Jan 30 '24 14:01 LaurentRDC