Extend graph implementations
Is your feature request related to a problem? Please describe.
Currently we have only a single graph implementation which is list based. The main problem, besides high lookup times, is that we cannot (easily) define our own indices for nodes. Instead we have to trace them in a hash map on our own.
This is annoying, because:
- Many graphs nodes we have in Rizin, could be easily identified by an address.
- Lookup times could be reduced, if we would use an adjacent matrix implementationaa with a hash map.
- ~~
rz_graph_add_nodedoesn't check, if the node was already added (because it needs to compare the user supplied data, since the index says nothing about the node). Really annoying.~~ This is "fixed" (very inefficiently) in https://github.com/rizinorg/rizin/pull/4212
Describe the solution you'd like
- Extending the list base graph, so it supports arbitrary node identifiers (and checks for edge and node duplicates).
- Add a adjacency matrix based graph.
- Add the ability to pre-allocate memory for X nodes/edges (see: https://github.com/rizinorg/rizin/issues/4231#issuecomment-3368401723)
Describe alternatives you've considered
Point 1. from above.
Additional context
We probably should resolve https://github.com/rizinorg/rizin/issues/397 before.
Also relevant is https://github.com/rizinorg/rizin/issues/992
See files:
librz/util/graph.clibrz/util/graph_drawable.clibrz/core/agraph.c
By any means necessary. The new graph should hash its nodes and edges. And document how exactly it identifies the nodes.
Currently, it compares only pointers of nodes to determine if a node is in the graph or not. So it is not possible to make a copy of a node and check if it is in the graph. Pointers will mismatch, although the content is the same, and it says no. Which is of course nonsense.
checking code. may I ask the meaning of " it supports arbitrary node identifiers" @Rot127 ?
Check RzGraphNode. It has an index idx (I think into the node list) and the data. The idx is the only unique identifier for the node. It can't have a name or something.
E.g. if you have a graph with unique names, we cannot give a node a string identifier. It is always identified by the index into the node list.
It would be good to allow for different identifiers like strings or any hashable object.
This problem particulary is very annoying. Because the graph easily gets "duplicate nodes" added.
Because when adding a node, it doesn't check the data of the other nodes. But checks only the idx. But if two nodes have the same data (and this suggests they are the same node) it adds it twice.
checking code.
I would recommend you to just start from scratch. No need to reuse anything else than the naming like RzGraph.
Yes current idx identifier are assigned uniquely in the scope of single graph, and suggest no same nodes for graph_add_node.
Thank you, get it now.
I think it would be good to assign a hash() function when the graph is created. So graph_add_node takes the node data, passes it to hash() and gets the key of the node this way. The has function can be implemented by the user. So if a node is simply an address it can look something like hash(void *ndata) { return (AddrNodeData *)ndata->address; }.
IIRC the util/graph.c was designed so that the RzGraph/RzGraphNode would be embedded into more powerful structures. See RzAGraph for examples. That said, i'm sure it can be improved!
Also we should make the printing of a graph (in different graph formats, dot, GML etc.) independent of RzCore.
Currently rz_core_graph_print_graph() needs the core. Which doesn't make sense.
Also, https://github.com/rizinorg/rizin/pull/4753#discussion_r1864303893 can be addressed with this change.
@Heersin have you had a chance to look at it?
One feature more I'd like to see. It would be nice if the graph can be initialized with a certain capacity of notes.
Let's assume we know in advance the rough size our graph has (edges and nodes) it would be cheaper if we could allocate a graph with this size ones, and then add the nodes and edges. Only reallocating more memory if more edges/nodes than expected were added.