graph
graph copied to clipboard
Proposal: Decouple vertice/edge store from graphs
Hey there,
Let me start off by saying a big thank you for releasing this as open source. I really like the use of hashes for identifying vertices as it allowes for more flexibility.
I was attempting to split out the storage/retrieval of vertices/edges to a Store
interface and eventually implement a sql store (mainly to figure out how horrible the performance would be). While doing so, I bumped into a couple of things I ended up changing that I wasn't really comfortable with and wanted to ask a couple of questions if that's ok.
At this point can I just say that this is meant as critique, but rather as genuine questions from someone who hasn't spent much time dealing with graphs. I'm just trying to understand the thinking behind the design decisions and avoid making mistakes because I did not understand or think of something.
This PR is a work in progress and is mainly to get a discussion started. In its current state it only includes an in-memory store as a POC, but tests should be all passing. (Memory store tests are needed.)
I'd be more than happy to work with you on getting this right and merging it upstream, but I totally understand if this is not something you'd be interested in; in which case please feel free to close this PR at any point. :)
So, couple of questions:
-
What's the need for the
un/directed.edges
attribute? From what I can tell, in both directed and undirected it's the same asoutEdges
. I've kinda removed it as I wasn't really sure on what the usecase was. -
Currently the
Edge[T]
struct contains the source/target vertices. Besides a more developer-friendly API, is there a reason for the source/targets to not just be hashes? I played around with usingEdge[K]
instead, as it greatly simplifies theStore
implementations. -
Do you have any plans currently on adding extra attributes to the
Edge
besides weight? ie. Labels maybe? -
The readme mentions that multigraphs are not supported. Is this something you are planning or would be interested in? It would pose a number of interesting questions around the API for both the graph and the Store.
Thank you very much in advance! Regards, ~geoah
Hi, thanks for your proposal and your questions! I'm happy to answer them.
-
At this point,
egdes
only exists for historical reasons. I decided to maintain anoutEdges
map explicitly for outgoing edges, and anedges
map containing all edges. I wasn't sure whether these two maps would be identical, but with the current implementation, they are. I haven't optimized the library for better time and space efficiency yet, but one of the first steps definitely would be to removeedges
. -
The reason behind this is only the more developer-friendly API. This way, you can directly access the underlying vertex object without looking it up via its hash. I think this works well enough for many if not most use cases, but I'm also aware that this leaves room for optimization.
-
Absolutely! I'll create an issue for adding labels.
-
I would love to support multigraphs in theory, but it makes everything much more complicated. If you're designing a storage API, you don't need to have multigraphs in mind.
Opened #13 for edge labels.
Hey @dominikbraun, thank you for your reply and answering my questions.
Do you think splitting out the store interface would be of use to this library or would it be outside of its scope?
If it’s something you would consider, when you have some spare time I’d appreciate some feedback and suggestions on the current approach and I'd be more than happy to refactor it.
Making the store implementation flexible would be useful indeed. I think it might make sense to prescribe a store API and provide an in-memory default implementation like the current one - anything beyond that can then be implemented by the user or provided by an extra library.
I'll take a look at your approach later today 🙂
So I took a look at your proposal and I like the design so far. I only have three thoughts or possible improvements in mind:
- I'd love to default to an in-memory store and accept a custom store as a functional option. At the moment, creating a graph would look something like this:
graph.NewWithStore(graph.IntHash, graph.NewMemoryStore(graph.IntHash), graph.Weighted())
That IntHash
part is kind of repetitive. I would prefer something like this:
graph.New(graph.IntHash(), graph.Weighted())
And with a custom store:
graph.New(graph.IntHash(), graph.Weighted(), graph.WithStore(myStore))
The problem is that there already are functional options for traits
. To implement my proposed solution, we would need to create an options
type for the functional options API, and read the fields of these options into g.traits
and g.store
when creating a graph. Would you see any problems with this approach?
-
We should try to keep the
Store
interface as small as possible. I didn't check it in detail yet, but maybe the graph can build the result of something likeGetEdgeTargetHashes
itself usingGetEdgeTargets
? Perhaps there is an opportunity to remove a method if it is only used once or twice and the result can be built ourselves. -
The naming of variables and functions, not just in your PR but in the entire codebase, needs to be unified a bit. For example,
GetEdgeTargets
returns a variablesourceEdgesArray
- for the sake of readability, I'd argue that either the variable name should beedgeTargets
or the function name should besourcesEdges
. This is a nitpick though.
All in all, I'm quite a fan of this feature!
That's really nice to hear. :D
-
I struggled with this a bit and the issue I had was that in order for me to add a
Store[K, T]
in the traits, I had to make the traits struct itself generic which would result in having to instantiate all options with the correct types. (iegraph.Directed[int, int]()
which would be horrible. -- I'll revisit this and try to get them into the existing options, hopefully without much reflection. -
That makes sense, at this point I tried to lift and shift as much code as possible to keep things simple. -- I'll check what can be consolidated to reduce the surface of the interface without affecting performance too much.
-
Couldn't agree more. Naming can definitely be improved in a lot of places to make things more clear and consistent.
Let me play around with this a bit more and I'll ping you with some updates!
I see... Making traits
generic comes with a huge overhead and won't be a viable solution. So there would be two possibilities:
- Include the
Hash
function inStore
, only accept aStore
, and provide anInMemory
function that takes a hash function and returns a generic in-memory store. Example:
type Store[K comparable, T any] interface {
GetVertex(K) T
}
func InMemory[K comparable, T any](hash Hash[K, T]) Store[K, T] {
return inMemoryStore[K, T]{
hash: hash,
}
}
func New[K comparable, T any](store Store[K, T], options ...func(*traits)) Graph[K, T] {
// ...
}
The New
call would look as follows:
g := graph.New(graph.InMemory(graph.IntHash), graph.Weighted())
If somebody wanted to use a custom data type, they could still use the generic in-memory store, and if somebody wanted to use a custom store, they could still use the pre-defined hashing functions.
- Create a
NewWithStore
function as you already did. Sorry for bothering you with design challenges that you've already considered 🤣
The disadvantage of the NewWithStore
function is that the hashing function needs to be passed twice. To circumvent this, we could go for a hybrid approach where the New
function stays as it is, and the NewWithStore
function accepts a store
and we provide an InMemory
wrapper function as shown above. The calls would look as follows:
g := graph.New(graph.IntHash, graph.Weighted())
g := graph.NewWithStore(graph.InMemory(graph.IntHash), graph.Weighted())
New
could then just call NewWithStore
with the in-memory store. What do you think of this approach?
Hey @dominikbraun, quick update.
- Effing generics...
I noticed that we don't really need the hashing function in the store so did some refactoring to remove it, but even then, the graph.New()
is still a pain in the butt.
We still can't put the Store[K, T]
on the trait struct, and now because we don't provide the hash function for go to infer infer K and T, instead of graph.NewMemoryStore(graph.IntHash)
we need to do graph.NewMemoryStore[int, int]()
.
I'll keep working on this and try some your suggestions.
- In slightly better news, the
Store interface
looks a bit slimmer. Attempted to rename some of the methods to make them a bit more understandable and now we have the following:
type Store[K comparable, T any] interface {
AddVertex(k K, t T) error // This allowed me to remove the need for the hashing function in the Store
GetVertex(k K) (*T, bool)
ListVertices() ([]K, bool) // Still thinking about this one vs GetAllVertexHashes()
AddEdge(sourceHash, targetHash K, edge Edge[K]) error
GetEdge(source, target K) (Edge[K], bool)
GetEdgesBySource(source K) ([]Edge[K], bool)
GetEdgesByTarget(target K) ([]Edge[K], bool)
}
Nice! Not needing the hashing function in Store
makes things much simpler. To be honest, function calls like graph.New(graph.IntHash, graph.Directed())
and graph.NewWithStore(graph.IntHash, graph.InMemory(), graph.Directed()
are fine for me. That way, you don't need to put the Store
into the traits IMO.
@dominikbraun would have been nice if I had actually pushed the code I was talking about.
Pushed now, you should be able to check it out. I've also added a bit more consistent error handling as it's really useful for when dealing with any non in-memory stores.
FYI: I've just merged PR #17 and released it as v0.5.0. This would need to be changed here as well.
I finally found a use case for the edges
map, at least in the undirected
graph: I'm planning to implement a EdgesWithHashes
function that returns a []Edge[K]
(notice K
instead of T
), and I see no other way to do this than registering each undirected edge in that map. Using outEdges
or inEdges
would returns all edges twice.
But that doesn't need to affect the Store
interface. The undirected graph can keep track of this itself.
Just a kudos for having NewWithStore
and an interface, precisely what I was looking for (need serialization).
Hey @dominikbraun, just came back from vacation and I was picking this up again when I bumped into #25. I might wait until the dust has settled on that one as I think it will greatly simply decoupling the graph from the storage.
Ah, I've waited with #25 because I thought this PR and #25 would cause too many conflicts, but if you feel that #25 will simplify things here, we'll prioritize #25.
@dominikbraun I assume that the plan for #25 is that all other graph methods will be using the new API right? In that case this PR will only have to refactor the basic graph implementation and not the other graph methods.
@dominikbraun I assume that the plan for #25 is that all other graph methods will be using the new API right? In that case this PR will only have to refactor the basic graph implementation and not the other graph methods.
Yap.
Re-opening to start with a clean thread :D
@geoah That seems to be the best solution since this PR is already a bit old and too large :-)