rizin Extend graph implementations

Is your feature request related to a problem? Please describe.

Currently we have only a single graph implementation which is list based. The main problem, besides high lookup times, is that we cannot (easily) define our own indices for nodes. Instead we have to trace them in a hash map on our own.

This is annoying, because:

Many graphs nodes we have in Rizin, could be easily identified by an address.
Lookup times could be reduced, if we would use an adjacent matrix implementationaa with a hash map.
~~rz_graph_add_node doesn't check, if the node was already added (because it needs to compare the user supplied data, since the index says nothing about the node). Really annoying.~~ This is "fixed" (very inefficiently) in https://github.com/rizinorg/rizin/pull/4212

Describe the solution you'd like

Extending the list base graph, so it supports arbitrary node identifiers (and checks for edge and node duplicates).
Add a adjacency matrix based graph.
Add the ability to pre-allocate memory for X nodes/edges (see: https://github.com/rizinorg/rizin/issues/4231#issuecomment-3368401723)

Describe alternatives you've considered

Point 1. from above.

Additional context

We probably should resolve https://github.com/rizinorg/rizin/issues/397 before.

Also relevant is https://github.com/rizinorg/rizin/issues/992

See files:

librz/util/graph.c
librz/util/graph_drawable.c
librz/core/agraph.c

Feb 15 '24 05:02 Rot127

By any means necessary. The new graph should hash its nodes and edges. And document how exactly it identifies the nodes.

Currently, it compares only pointers of nodes to determine if a node is in the graph or not. So it is not possible to make a copy of a node and check if it is in the graph. Pointers will mismatch, although the content is the same, and it says no. Which is of course nonsense.

Jul 26 '24 10:07 Rot127

checking code. may I ask the meaning of " it supports arbitrary node identifiers" @Rot127 ?

Aug 12 '24 12:08 Heersin

Check RzGraphNode. It has an index idx (I think into the node list) and the data. The idx is the only unique identifier for the node. It can't have a name or something.

E.g. if you have a graph with unique names, we cannot give a node a string identifier. It is always identified by the index into the node list.

It would be good to allow for different identifiers like strings or any hashable object.

Aug 12 '24 13:08 Rot127

This problem particulary is very annoying. Because the graph easily gets "duplicate nodes" added. Because when adding a node, it doesn't check the data of the other nodes. But checks only the idx. But if two nodes have the same data (and this suggests they are the same node) it adds it twice.

checking code.

I would recommend you to just start from scratch. No need to reuse anything else than the naming like RzGraph.

Aug 12 '24 13:08 Rot127

Yes current idx identifier are assigned uniquely in the scope of single graph, and suggest no same nodes for graph_add_node. Thank you, get it now.

Aug 12 '24 13:08 Heersin

I think it would be good to assign a hash() function when the graph is created. So graph_add_node takes the node data, passes it to hash() and gets the key of the node this way. The has function can be implemented by the user. So if a node is simply an address it can look something like hash(void *ndata) { return (AddrNodeData *)ndata->address; }.

Aug 13 '24 02:08 Rot127

IIRC the util/graph.c was designed so that the RzGraph/RzGraphNode would be embedded into more powerful structures. See RzAGraph for examples. That said, i'm sure it can be improved!

Aug 13 '24 17:08 ret2libc

Also we should make the printing of a graph (in different graph formats, dot, GML etc.) independent of RzCore. Currently rz_core_graph_print_graph() needs the core. Which doesn't make sense.

Also, https://github.com/rizinorg/rizin/pull/4753#discussion_r1864303893 can be addressed with this change.

Dec 02 '24 16:12 Rot127

@Heersin have you had a chance to look at it?

Dec 03 '24 04:12 XVilka

One feature more I'd like to see. It would be nice if the graph can be initialized with a certain capacity of notes.

Let's assume we know in advance the rough size our graph has (edges and nodes) it would be cheaper if we could allocate a graph with this size ones, and then add the nodes and edges. Only reallocating more memory if more edges/nodes than expected were added.

Oct 04 '25 16:10 Rot127