jung
jung copied to clipboard
Implement a topological ordering algorithm
I was implementing a topological sorting prototype method yesterday out of curiosity, and I'm sharing it today in this PR to start a discussion about whether topological sorting has a place in JUNG, and if so, whether my implementation could be used as a starting point and if so how it could be improved.
@jbduncan I think that this is something that we're likely to do in common.graph
, so I'd rather do it there and let JUNG use that. (There's actually an internal implementation already, but it's not ready to release externally yet.)
I'd be happy for you to file an issue against Guava to provide such a thing, though.
@jrtom Sounds good to me! I'd love it if you could indeed open an issue against Guava for me. :)
If you've got time, please go ahead and file the issue against Guava (that's what I meant by my comment). Otherwise I'll get to it eventually.
Side note: I haven't looked at your implementation yet, but it's quite possible that it's better in some ways than what we have internally, so feel free to reference this PR in your issue if you like. :)
If you've got time, please go ahead and file the issue against Guava (that's what I meant by my comment). Otherwise I'll get to it eventually.
Oops, I misread your earlier comment, silly me!
Side note: I haven't looked at your implementation yet, but it's quite possible that it's better in some ways than what we have internally, so feel free to reference this PR in your issue if you like. :)
I'll keep this PR open then as a reminder for myself to open the issue and discuss my impl with the rest of the Guava team. :)
@jrtom I don't know if it's worth keeping this PR open now, as https://github.com/google/guava/issues/3025 was closed and I don't know if there is an existing implementation of Topological Sort that is planned on being open sourced either in JUNG or Guava.
WDYT?
Oh, I've just realised that https://github.com/google/guava/issues/2641 is still open! But I'm still unsure about whether it's worth closing this PR or not.
I've copied this implementation into my own little graph library: https://github.com/jbduncan/guava-graph-utils/blob/2525ac6490cb3e35baf8f26f2c55b281de0de630/src/main/java/me/jbduncan/guavagraphutils/MoreGraphs.java#L337-L429
Over time I've come to realise two drawbacks of this approach:
- It uses Kahn's algorithm, which needs both
Graph::inDegree
andGraph::successors
. But the other common algorithm, reversed depth-first postorder (I believe that's right? It's not preorder, is it?), only needsGraph::successors
and so the parameter could be made aSuccessorsFunction
, which is more flexible. - The returned set's
contains
implementation is O(n) because it inheritsAbstractSet::contains
, which iterates over the iterator rather than peering into the underlying hash table in O(1) time. But this could be fixed by:- Overriding
contains
to usegraph.nodes()::contains
:- Pro: It continues to give us a lazy set.
- Con: It precludes the possibility of turning the graph parameter into a successors function.
- Changing the implementation to return an
ImmutableSet
:- Pro: It allows the graph parameter to be turned into a successors function.
- Con: It forces an up-front calculation and memory cost.
- Changing the implementation to return a lazy
Iterable
:- Pros:
- It continues to give us a lazy view, with all its performance benefits.
- It allows the graph parameter to be turned into a successors function.
- Con: To use it as a set, it has to be passed to
ImmutableSet::copyOf
.
- Pros:
- Overriding
So if this PR ever sees the light of day, I would change it to return an Iterable
, as I believe that has the largest pro-to-con ratio.
Hey, @jbduncan, good to hear from you. :)
FYI, I am still paying attention to this project, and I do still intend to release 3.0 (i.e., not just a snapshot version, which is what there currently is). I'm navigating some organizational changes at work but I'm hoping to have a bit more bandwidth for this (and for Guava) within the next few months. (I've done some minor cleanup recently.)
Guava does have an internal implementation, although I'm not sure it's the best one (part of why it's still internal), so yours might be preferable.
I think that your argument for approach #3 (returning an Iterable
) is solid; I think it's reasonable to assume that much of the time when one wants a topological sorting that you just want to iterate over it. The fact that it's a Set
is intrinsic in the definition of the output of a topological sort, so I don't think you're losing important semantics there. Plus, it's literally the same set as the graph's node set; the important thing is the ordering.
Hi @jrtom, great to hear from you, too! It's been a long time.
I'm glad JUNG is still on your radar, even if work is keeping you busy. And good to hear that Guava's internal implementation is still there but needs further consideration.
I think that your argument for approach https://github.com/jrtom/jung/issues/3 (returning an
Iterable
) is solid; I think it's reasonable to assume that much of the time when one wants a topological sorting that you just want to iterate over it. The fact that it's aSet
is intrinsic in the definition of the output of a topological sort, so I don't think you're losing important semantics there. Plus, it's literally the same set as the graph's node set; the important thing is the ordering.
Thank you for your thoughts, much appreciated. These are all good points; I too realised that the topological sorting set and the node set are one and the same, but only as I was writing my last response out. So the Iterable
approach it is, then!
In my mind, we'd ideally put topological sorting into common.graph.Traverser
to keep similar concepts together, so it lends more credence to the idea of waiting until Guava's version is ready, if ever. (How I wish I could just peer into Google's codebase. :smile:) In the meantime, I'll apply approach (3) to my copy of this PR's code in guava-graph-utils.
Something I hadn't fully realised before is that the current Kahn's-algorithm-based approach and an approach based on reverse depth-first postorder are both eager in some way.
This is because Kahn's algorithm relies on sets of roots and non-roots to work, which this PR has to calculate upfront on every iterator()
call in O(n) time. As for reversed depth-first postorder, the entire depth-first order of the graph has to be calculated before it can be reversed, which is an upfront O(n) calculation again*.
Thankfully, for both these approaches, their eagerness only applies when a new iterator()
is requested, so there's still some value in returning an Iterable
as per approach (3).
As a future improvement, if we could determine that the input was of a hypothetical graph type with roots()
and nonRoots()
methods which are pre-cached like List::size
, then we could eliminate the need to calculate the roots and non-roots every time and use Kahn's algorithm lazily.
*Unless the user actually wants the reversed topological order of the graph, which could then be lazy, but such a method is outside the purview of this PR.
As a future improvement, if we could determine that the input was of a hypothetical graph type with
roots()
andnonRoots()
methods which are pre-cached like List::size, then we could eliminate the need to calculate the roots and non-roots every time and use Kahn's algorithm lazily.
But that would only work if roots()
was a Queue
and nonRoots()
was a Multiset
and if both were mutable as required by the current implementation, so maybe not. :thinking:
I've also come to realise that with depth-first as the implementation, we need either of these two API variants:
public static <N> Iterable<N> topologicallySortedNodes(Graph<N> graph) {
return topologicallySortedNodes(graph.nodes(), graph);
}
public static <N> Iterable<N> topologicallySortedNodes(Iterable<N> startingNodes, SuccessorsFunction<N> successorsFunction) {
...
}
This is because a topological sorting is typically derived from the entire node set, which has to be passed in somehow. The second version with startingNodes
and a successorsFunction
is similar to how google/mug's version works.
Actually, since the depth-first search algorithm needs to make a list internally to reverse its results before returning them anyway, I wonder if the depth-first search version should have its return type be ImmutableList
directly so that if the result is passed to ImmutableList::copyOf
, it avoids unnecessary copies.
Okay, I've convinced myself. In my library, the depth-first version will return a list, and the Kahn's algorithm version will still return an Iterable
for now. (On reading JGraphT's TopologicalOrderIterator
on GitHub, it seems to follow a lazy approach using Kahn's algorithm too, so there's some precedence for this.)
I've decided to close this PR for now as I'm quite happy with my topological sorting methods in guava-graph-utils. 😁