jung Implement a topological ordering algorithm

I was implementing a topological sorting prototype method yesterday out of curiosity, and I'm sharing it today in this PR to start a discussion about whether topological sorting has a place in JUNG, and if so, whether my implementation could be used as a starting point and if so how it could be improved.

Jan 07 '18 18:01 jbduncan

@jbduncan I think that this is something that we're likely to do in common.graph, so I'd rather do it there and let JUNG use that. (There's actually an internal implementation already, but it's not ready to release externally yet.)

I'd be happy for you to file an issue against Guava to provide such a thing, though.

Jan 07 '18 23:01 jrtom

@jrtom Sounds good to me! I'd love it if you could indeed open an issue against Guava for me. :)

Jan 07 '18 23:01 jbduncan

If you've got time, please go ahead and file the issue against Guava (that's what I meant by my comment). Otherwise I'll get to it eventually.

Side note: I haven't looked at your implementation yet, but it's quite possible that it's better in some ways than what we have internally, so feel free to reference this PR in your issue if you like. :)

Jan 07 '18 23:01 jrtom

If you've got time, please go ahead and file the issue against Guava (that's what I meant by my comment). Otherwise I'll get to it eventually.

Oops, I misread your earlier comment, silly me!

Side note: I haven't looked at your implementation yet, but it's quite possible that it's better in some ways than what we have internally, so feel free to reference this PR in your issue if you like. :)

I'll keep this PR open then as a reminder for myself to open the issue and discuss my impl with the rest of the Guava team. :)

Jan 07 '18 23:01 jbduncan

@jrtom I don't know if it's worth keeping this PR open now, as https://github.com/google/guava/issues/3025 was closed and I don't know if there is an existing implementation of Topological Sort that is planned on being open sourced either in JUNG or Guava.

WDYT?

Apr 19 '18 18:04 jbduncan

Oh, I've just realised that https://github.com/google/guava/issues/2641 is still open! But I'm still unsure about whether it's worth closing this PR or not.

Apr 19 '18 18:04 jbduncan

I've copied this implementation into my own little graph library: https://github.com/jbduncan/guava-graph-utils/blob/2525ac6490cb3e35baf8f26f2c55b281de0de630/src/main/java/me/jbduncan/guavagraphutils/MoreGraphs.java#L337-L429

Jan 03 '23 18:01 jbduncan

Over time I've come to realise two drawbacks of this approach:

It uses Kahn's algorithm, which needs both Graph::inDegree and Graph::successors. But the other common algorithm, reversed depth-first postorder (I believe that's right? It's not preorder, is it?), only needs Graph::successors and so the parameter could be made a SuccessorsFunction, which is more flexible.
The returned set's contains implementation is O(n) because it inherits AbstractSet::contains, which iterates over the iterator rather than peering into the underlying hash table in O(1) time. But this could be fixed by:
1. Overriding contains to use graph.nodes()::contains:
  1. Pro: It continues to give us a lazy set.
  2. Con: It precludes the possibility of turning the graph parameter into a successors function.
2. Changing the implementation to return an ImmutableSet:
  1. Pro: It allows the graph parameter to be turned into a successors function.
  2. Con: It forces an up-front calculation and memory cost.
3. Changing the implementation to return a lazy Iterable:
  1. Pros:
    1. It continues to give us a lazy view, with all its performance benefits.
    2. It allows the graph parameter to be turned into a successors function.
  2. Con: To use it as a set, it has to be passed to ImmutableSet::copyOf.

So if this PR ever sees the light of day, I would change it to return an Iterable, as I believe that has the largest pro-to-con ratio.

Feb 11 '23 19:02 jbduncan

Hey, @jbduncan, good to hear from you. :)
FYI, I am still paying attention to this project, and I do still intend to release 3.0 (i.e., not just a snapshot version, which is what there currently is). I'm navigating some organizational changes at work but I'm hoping to have a bit more bandwidth for this (and for Guava) within the next few months. (I've done some minor cleanup recently.)

Guava does have an internal implementation, although I'm not sure it's the best one (part of why it's still internal), so yours might be preferable.

I think that your argument for approach #3 (returning an Iterable) is solid; I think it's reasonable to assume that much of the time when one wants a topological sorting that you just want to iterate over it. The fact that it's a Set is intrinsic in the definition of the output of a topological sort, so I don't think you're losing important semantics there. Plus, it's literally the same set as the graph's node set; the important thing is the ordering.

Feb 11 '23 19:02 jrtom

Hi @jrtom, great to hear from you, too! It's been a long time.

I'm glad JUNG is still on your radar, even if work is keeping you busy. And good to hear that Guava's internal implementation is still there but needs further consideration.

I think that your argument for approach https://github.com/jrtom/jung/issues/3 (returning an Iterable) is solid; I think it's reasonable to assume that much of the time when one wants a topological sorting that you just want to iterate over it. The fact that it's a Set is intrinsic in the definition of the output of a topological sort, so I don't think you're losing important semantics there. Plus, it's literally the same set as the graph's node set; the important thing is the ordering.

Thank you for your thoughts, much appreciated. These are all good points; I too realised that the topological sorting set and the node set are one and the same, but only as I was writing my last response out. So the Iterable approach it is, then!

In my mind, we'd ideally put topological sorting into common.graph.Traverser to keep similar concepts together, so it lends more credence to the idea of waiting until Guava's version is ready, if ever. (How I wish I could just peer into Google's codebase. :smile:) In the meantime, I'll apply approach (3) to my copy of this PR's code in guava-graph-utils.

Feb 11 '23 23:02 jbduncan

Something I hadn't fully realised before is that the current Kahn's-algorithm-based approach and an approach based on reverse depth-first postorder are both eager in some way.

This is because Kahn's algorithm relies on sets of roots and non-roots to work, which this PR has to calculate upfront on every iterator() call in O(n) time. As for reversed depth-first postorder, the entire depth-first order of the graph has to be calculated before it can be reversed, which is an upfront O(n) calculation again*.

Thankfully, for both these approaches, their eagerness only applies when a new iterator() is requested, so there's still some value in returning an Iterable as per approach (3).

As a future improvement, if we could determine that the input was of a hypothetical graph type with roots() and nonRoots() methods which are pre-cached like List::size, then we could eliminate the need to calculate the roots and non-roots every time and use Kahn's algorithm lazily.

*Unless the user actually wants the reversed topological order of the graph, which could then be lazy, but such a method is outside the purview of this PR.

Feb 12 '23 01:02 jbduncan

As a future improvement, if we could determine that the input was of a hypothetical graph type with roots() and nonRoots() methods which are pre-cached like List::size, then we could eliminate the need to calculate the roots and non-roots every time and use Kahn's algorithm lazily.

But that would only work if roots() was a Queue and nonRoots() was a Multiset and if both were mutable as required by the current implementation, so maybe not. :thinking:

Feb 12 '23 01:02 jbduncan

I've also come to realise that with depth-first as the implementation, we need either of these two API variants:

public static <N> Iterable<N> topologicallySortedNodes(Graph<N> graph) {
  return topologicallySortedNodes(graph.nodes(), graph);
}

public static <N> Iterable<N> topologicallySortedNodes(Iterable<N> startingNodes, SuccessorsFunction<N> successorsFunction) {
  ...
}

This is because a topological sorting is typically derived from the entire node set, which has to be passed in somehow. The second version with startingNodes and a successorsFunction is similar to how google/mug's version works.

Feb 12 '23 19:02 jbduncan

Actually, since the depth-first search algorithm needs to make a list internally to reverse its results before returning them anyway, I wonder if the depth-first search version should have its return type be ImmutableList directly so that if the result is passed to ImmutableList::copyOf, it avoids unnecessary copies.

Okay, I've convinced myself. In my library, the depth-first version will return a list, and the Kahn's algorithm version will still return an Iterable for now. (On reading JGraphT's TopologicalOrderIterator on GitHub, it seems to follow a lazy approach using Kahn's algorithm too, so there's some precedence for this.)

Feb 28 '23 00:02 jbduncan

I've decided to close this PR for now as I'm quite happy with my topological sorting methods in guava-graph-utils. 😁

Jun 07 '24 10:06 jbduncan

jung jung copied to clipboard

Implement a topological ordering algorithm

jung
jung copied to clipboard