rustworkx icon indicating copy to clipboard operation
rustworkx copied to clipboard

Add an `initial_nodes` argument to topological sorters

Open jakelishman opened this issue 6 months ago • 1 comments

What is the expected enhancement?

For both lexicographical_topological_sort and TopologicalSorter, it would be convenient (and possibly more performant) on Qiskit if we could pass a list of start-point indices to the sorter. This is for two purposes:

  1. we don't need Rustworkx to iterate through the whole graph to find the initial nodes; in the cases I'm thinking of for Qiskit, we already know the node indices we ought to start from.
  2. this would let us start the topological ordering from a mid point of the ordering, such as if we only want to consider things that happen in the graph after a certain point, without having to calculate the order up to that point.

Give-or-take, my proposed API would be to expose an initial_nodes: Optional[Sequence[int]] argument to lexicographical_topological_sort and TopologicalSorter. If not given, the logic remains the same as it is now. If it is given, the given initial_nodes are treated as the start point, even if they do no have zero in degree. All other nodes are treated identically, so if there are nodes that cannot be visited because they come topologically after nodes that do not strictly dominate the initial_nodes, they are simply never returned.

Logically, I think this can be achieved by slightly modifying the population of the in_degree maps in the sorters to do some entry(node).get_or_insert_with(|node| in_degree(node)) type of stuff in the ordering calculations, so the in degree of nodes is calculated lazily as a node first becomes visitable. In the case that initial_nodes was not passed, we'd still eagerly populate the whole in_degree map.

Similar to #1105, I'm happy to supply a PR if this is an acceptable feature.

jakelishman avatar Feb 21 '24 13:02 jakelishman

Hmm, actually an edge case in the lazy calculation: if a node is zero in degree, but not in the initial_nodes list, it wouldn't get visited.

For my purposes, that's actually the situation I would like to have, because my main goal with this is to prevent a complete iteration through the nodes of the graph for graphs with very large numbers of nodes. This poses a couple of interface questions from my perspective:

  1. if we keep the "zero-degree but not initial are not visited" behaviour, is initial_nodes a good name for the argument? i.e., can we treat it purely as a documented case and push users to the right intuition with the parameter name?
  2. is there a sensible way to change the interface, such that it's still possible to prevent an initial all-nodes iteration while still being able to find zero-indegree nodes if necessary?

My main reasons for this are related to TopologicalSorter rather than lexicographical_topological_sort, but it does apply to both. There's several use-cases we've recently identified on Qiskit where we want to use the on-line sorter to have a short-circuiting topological-order generator, because we're going to need to get parts of the topological order repeatedly in loops while adding/changing nodes, and the cost of the full-graph iteration is (I think - I haven't actually prototyped it yet) too high to bear (it makes the best-case performance of certain algorithms quadratic, whereas the best case for them should be linear).

jakelishman avatar Feb 21 '24 13:02 jakelishman