graph Proposal: Decouple vertice/edge store from graphs

Hey there,

Let me start off by saying a big thank you for releasing this as open source. I really like the use of hashes for identifying vertices as it allowes for more flexibility.

I was attempting to split out the storage/retrieval of vertices/edges to a Store interface and eventually implement a sql store (mainly to figure out how horrible the performance would be). While doing so, I bumped into a couple of things I ended up changing that I wasn't really comfortable with and wanted to ask a couple of questions if that's ok.

At this point can I just say that this is meant as critique, but rather as genuine questions from someone who hasn't spent much time dealing with graphs. I'm just trying to understand the thinking behind the design decisions and avoid making mistakes because I did not understand or think of something.

This PR is a work in progress and is mainly to get a discussion started. In its current state it only includes an in-memory store as a POC, but tests should be all passing. (Memory store tests are needed.)

I'd be more than happy to work with you on getting this right and merging it upstream, but I totally understand if this is not something you'd be interested in; in which case please feel free to close this PR at any point. :)

So, couple of questions:

What's the need for the un/directed.edges attribute? From what I can tell, in both directed and undirected it's the same as outEdges. I've kinda removed it as I wasn't really sure on what the usecase was.
Currently the Edge[T] struct contains the source/target vertices. Besides a more developer-friendly API, is there a reason for the source/targets to not just be hashes? I played around with using Edge[K] instead, as it greatly simplifies the Store implementations.
Do you have any plans currently on adding extra attributes to the Edge besides weight? ie. Labels maybe?
The readme mentions that multigraphs are not supported. Is this something you are planning or would be interested in? It would pose a number of interesting questions around the API for both the graph and the Store.

Thank you very much in advance! Regards, ~geoah

Jul 17 '22 02:07 geoah

Hi, thanks for your proposal and your questions! I'm happy to answer them.

At this point, egdes only exists for historical reasons. I decided to maintain an outEdges map explicitly for outgoing edges, and an edges map containing all edges. I wasn't sure whether these two maps would be identical, but with the current implementation, they are. I haven't optimized the library for better time and space efficiency yet, but one of the first steps definitely would be to remove edges.
The reason behind this is only the more developer-friendly API. This way, you can directly access the underlying vertex object without looking it up via its hash. I think this works well enough for many if not most use cases, but I'm also aware that this leaves room for optimization.
Absolutely! I'll create an issue for adding labels.
I would love to support multigraphs in theory, but it makes everything much more complicated. If you're designing a storage API, you don't need to have multigraphs in mind.

Jul 17 '22 07:07 dominikbraun

Opened #13 for edge labels.

Jul 17 '22 08:07 dominikbraun

Hey @dominikbraun, thank you for your reply and answering my questions.

Do you think splitting out the store interface would be of use to this library or would it be outside of its scope?

If it’s something you would consider, when you have some spare time I’d appreciate some feedback and suggestions on the current approach and I'd be more than happy to refactor it.

Jul 17 '22 12:07 geoah

Making the store implementation flexible would be useful indeed. I think it might make sense to prescribe a store API and provide an in-memory default implementation like the current one - anything beyond that can then be implemented by the user or provided by an extra library.

I'll take a look at your approach later today 🙂

Jul 17 '22 12:07 dominikbraun

So I took a look at your proposal and I like the design so far. I only have three thoughts or possible improvements in mind:

I'd love to default to an in-memory store and accept a custom store as a functional option. At the moment, creating a graph would look something like this:

graph.NewWithStore(graph.IntHash, graph.NewMemoryStore(graph.IntHash), graph.Weighted())

That IntHash part is kind of repetitive. I would prefer something like this:

graph.New(graph.IntHash(), graph.Weighted())

And with a custom store:

graph.New(graph.IntHash(), graph.Weighted(), graph.WithStore(myStore))

The problem is that there already are functional options for traits. To implement my proposed solution, we would need to create an options type for the functional options API, and read the fields of these options into g.traits and g.store when creating a graph. Would you see any problems with this approach?

We should try to keep the Store interface as small as possible. I didn't check it in detail yet, but maybe the graph can build the result of something like GetEdgeTargetHashes itself using GetEdgeTargets? Perhaps there is an opportunity to remove a method if it is only used once or twice and the result can be built ourselves.
The naming of variables and functions, not just in your PR but in the entire codebase, needs to be unified a bit. For example, GetEdgeTargets returns a variable sourceEdgesArray - for the sake of readability, I'd argue that either the variable name should be edgeTargets or the function name should be sourcesEdges. This is a nitpick though.

All in all, I'm quite a fan of this feature!

Jul 17 '22 21:07 dominikbraun

That's really nice to hear. :D

I struggled with this a bit and the issue I had was that in order for me to add a Store[K, T] in the traits, I had to make the traits struct itself generic which would result in having to instantiate all options with the correct types. (ie graph.Directed[int, int]() which would be horrible. -- I'll revisit this and try to get them into the existing options, hopefully without much reflection.
That makes sense, at this point I tried to lift and shift as much code as possible to keep things simple. -- I'll check what can be consolidated to reduce the surface of the interface without affecting performance too much.
Couldn't agree more. Naming can definitely be improved in a lot of places to make things more clear and consistent.

Let me play around with this a bit more and I'll ping you with some updates!

Jul 17 '22 23:07 geoah

I see... Making traits generic comes with a huge overhead and won't be a viable solution. So there would be two possibilities:

Include the Hash function in Store, only accept a Store, and provide an InMemory function that takes a hash function and returns a generic in-memory store. Example:

type Store[K comparable, T any] interface {
	GetVertex(K) T
}

func InMemory[K comparable, T any](hash Hash[K, T]) Store[K, T] {
	return inMemoryStore[K, T]{
                hash: hash,
        }
}

func New[K comparable, T any](store Store[K, T], options ...func(*traits)) Graph[K, T] {
        // ...
}

The New call would look as follows:

g := graph.New(graph.InMemory(graph.IntHash), graph.Weighted())

If somebody wanted to use a custom data type, they could still use the generic in-memory store, and if somebody wanted to use a custom store, they could still use the pre-defined hashing functions.

Create a NewWithStore function as you already did. Sorry for bothering you with design challenges that you've already considered 🤣

The disadvantage of the NewWithStore function is that the hashing function needs to be passed twice. To circumvent this, we could go for a hybrid approach where the New function stays as it is, and the NewWithStore function accepts a store and we provide an InMemory wrapper function as shown above. The calls would look as follows:

g := graph.New(graph.IntHash, graph.Weighted())
g := graph.NewWithStore(graph.InMemory(graph.IntHash), graph.Weighted())

New could then just call NewWithStore with the in-memory store. What do you think of this approach?

Jul 18 '22 09:07 dominikbraun

Hey @dominikbraun, quick update.

Effing generics...

I noticed that we don't really need the hashing function in the store so did some refactoring to remove it, but even then, the graph.New() is still a pain in the butt.

We still can't put the Store[K, T] on the trait struct, and now because we don't provide the hash function for go to infer infer K and T, instead of graph.NewMemoryStore(graph.IntHash) we need to do graph.NewMemoryStore[int, int]().

I'll keep working on this and try some your suggestions.

In slightly better news, the Store interface looks a bit slimmer. Attempted to rename some of the methods to make them a bit more understandable and now we have the following:


type Store[K comparable, T any] interface {
	AddVertex(k K, t T) error // This allowed me to remove the need for the hashing function in the Store
	GetVertex(k K) (*T, bool)
	ListVertices() ([]K, bool) // Still thinking about this one vs GetAllVertexHashes()
	AddEdge(sourceHash, targetHash K, edge Edge[K]) error
	GetEdge(source, target K) (Edge[K], bool)
	GetEdgesBySource(source K) ([]Edge[K], bool)
	GetEdgesByTarget(target K) ([]Edge[K], bool)
}

Jul 18 '22 21:07 geoah

Nice! Not needing the hashing function in Store makes things much simpler. To be honest, function calls like graph.New(graph.IntHash, graph.Directed()) and graph.NewWithStore(graph.IntHash, graph.InMemory(), graph.Directed() are fine for me. That way, you don't need to put the Store into the traits IMO.

Jul 21 '22 08:07 dominikbraun

@dominikbraun would have been nice if I had actually pushed the code I was talking about.

Pushed now, you should be able to check it out. I've also added a bit more consistent error handling as it's really useful for when dealing with any non in-memory stores.

Jul 21 '22 11:07 geoah

FYI: I've just merged PR #17 and released it as v0.5.0. This would need to be changed here as well.

Jul 21 '22 18:07 dominikbraun

I finally found a use case for the edges map, at least in the undirected graph: I'm planning to implement a EdgesWithHashes function that returns a []Edge[K] (notice K instead of T), and I see no other way to do this than registering each undirected edge in that map. Using outEdges or inEdges would returns all edges twice.

But that doesn't need to affect the Store interface. The undirected graph can keep track of this itself.

Jul 22 '22 06:07 dominikbraun

Just a kudos for having NewWithStore and an interface, precisely what I was looking for (need serialization).

Aug 07 '22 09:08 sdressler

Hey @dominikbraun, just came back from vacation and I was picking this up again when I bumped into #25. I might wait until the dust has settled on that one as I think it will greatly simply decoupling the graph from the storage.

Aug 07 '22 23:08 geoah

Ah, I've waited with #25 because I thought this PR and #25 would cause too many conflicts, but if you feel that #25 will simplify things here, we'll prioritize #25.

Aug 08 '22 09:08 dominikbraun

@dominikbraun I assume that the plan for #25 is that all other graph methods will be using the new API right? In that case this PR will only have to refactor the basic graph implementation and not the other graph methods.

Aug 08 '22 10:08 geoah

@dominikbraun I assume that the plan for #25 is that all other graph methods will be using the new API right? In that case this PR will only have to refactor the basic graph implementation and not the other graph methods.

Yap.

Aug 08 '22 10:08 dominikbraun

Re-opening to start with a clean thread :D

Sep 19 '22 17:09 geoah

@geoah That seems to be the best solution since this PR is already a bit old and too large :-)

Sep 19 '22 17:09 dominikbraun

graph graph copied to clipboard

Proposal: Decouple vertice/edge store from graphs

graph
graph copied to clipboard