rustworkx
rustworkx copied to clipboard
[joss] Questions about graph creation
This is not a bug report, just a question. I hope it's fine to ask it here, since there is currently no forum-like alternative for support questions.
How can I create a graph from a list of edges in terms of node names (not indices)? For example, I can create an undirected graph from an edge list given in terms of indices like this:
g=rx.PyGraph()
g.extend_from_edge_list([(0,1),(1,2)])
But what if I have node names, e.g. [('a', 'b'), ('b', 'c')]
? I suspect that I am missing some simple way to handle such edge lists. Or perhaps the list is [(10,20),(20,5)]
, where the three nodes of the graphs would be named 10
, 20
and 5
.
Once the graph is created, how can I refer to nodes by name instead of by their index? How can I look up the index of node 'c'
, for example?
FWIW, issues are fine for this IMO, especially because they often can lead to documentation improvements around user facing docs. Our intro documentation in particular is a bit lacking still (see #416 for the tracking issue on that) so this is useful to track for improving that.
The extend_from_edge_list
method only works with node indices as the index is what uniquely represents a node in the graph. So it can be used as a shortcut to create a graph where you don't want to set node weights directly. There is no built-in mechanism to the PyGraph
class to translate from a weight/data payload/name to the index (aside from .find_node_by_weight()
). So if you want to create a graph [('a', 'b'), ('b', 'c')]
you could do something like:
g=rx.PyGraph()
(node_a, node_b, node_c) = g.add_nodes_from(['a', 'b', 'c'])
g.add_edges_from_no_data([(node_a, node_b), (node_b, node_c)]) # to define with edge weights use .add_edges_from
With this approach the weights of the nodes can be anything, so you could use 10
, 20
, 5
if you wanted instead of 'a', 'b', 'c', or do something crazy where the weights are more complex python objects (including other retworkx graphs or numpy arrays, etc).
For the final question the easiest and most efficient way is to get the list of all node weights in the graph with g.nodes()
which return a list of the node payload/weight objects and the position in that list is the node index. For example in your g.extend_from_edge_list([('a','b'),('b','c')])
case if you did g.nodes()
it would return a sequence ['a', 'b', 'c']
so if you did something like {node: index for index, node in enumerate(g.nodes())}
you would get a dictionary mapping the node name to their index: {'a': 0, 'b': 1, 'c': 2}
. The .find_node_by_weight()
method I mentioned above does this internally for a single node so doing it as I suggested here is normally efficient if you want it for all nodes.
It's also worth pointing out at this point there is an open enhancement issue under discussion in #496 on adding an additional graph type that abstracts this away for you (so you can look up a node by c
directly), but that hasn't been added yet and probably won't be until the release after next.
Thank you for the detailed explanation @mtreinish, this is very helpful.
I notice how you called the 'a'
, 'b'
, etc. weights while I naively called them names, but in reality it is neither. It is the single data payload object associated with the vertex. I am wondering if it would be good to make the terminology consistent, especially within the API. For example, there is:
-
.weighted_edge_list()
, where it's called "weight". The same for.find_node_by_weight()
-
.get_edge_data()
where it is called "data" -
.update_edge()
which does not mention it by name at all
I agree having a consistent naming would be good. In the documentation I normally use "weight/data payload" so I'd probably go with one of those. That being said I'm typically concerned about backwards compatibility with making changes like that this and needing to weigh the value of consistency over breaking our existing users. That being said we did just start planning for an eventual 1.0.0 release which I think settling on a standard terminology would be good to do as part of that long term goal. We can work towards it by settling on a convention and adding APIs that use it, then gradually deprecate the APIs outside of the chosen convention and ultimately remove the old APIs as part of the 1.0.0 release.
On the other hand, this is a new library which hasn't grown too big yet, and which (presumably) has very few users other than Qiskit, which you have control over. You have the opportunity to stand out by having an excellent and consistent design.
I am constantly annoyed by igraph's historical baggage and envy those who get to start with a clean sheet. One of the reasons I volunteered to review retworkx was because I saw it as a rare opportunity to see how a new graph library is designed, and to see how you might avoid the mistakes that others have made in the past. I think we can learn from each other, and during the review I'll try to touch on all the things that I believe are critical design issues for any graph library. Also, feel free to ask me questions about how we solved various problems in igraph, either during or after the review.