guava icon indicating copy to clipboard operation
guava copied to clipboard

Graphs.transitiveClosure adding self-loops is unintuitive

Open meindratheal opened this issue 8 years ago • 13 comments

I've been working on a project that centres around DAGs. I was very surprised to find cycles cropping up, so I tracked the issue down to Graphs.transitiveClosure.

I did a bit of reading (Wikipedia mostly) to see if this was expected behaviour. However, everything I saw pointed towards self-cycles not being a requirement of a transitive closure. I found two different images on Wikipedia that show a closure with no self-cycle, The description on the Transitive Closure wiki page itself uses the example that nodes are airports and edges are flights, and the transitive closure is the graph of everything reachable from a node in one or more hops.

Then, I thought to actually check the documentation more thoroughly. It looks like this is intended behaviour, but should this behaviour maybe be changed? (Maybe a flag parameter that can be passed to disable self-cycles) If not, maybe the documentation could make this a little clearer, since the Guava notion of the transitive closure is slightly different from the usual definition (unless I've really misunderstood things) - reachability elsewhere is defined as within 1 or more hops, but in Guava it is 0 or more.

There's a one-line workaround at least (graph.nodes().forEach(n -> graph.removeEdge(n, n));), hopefully that doesn't fall afoul of any concurrent modification rules.

meindratheal avatar Apr 05 '17 01:04 meindratheal

I wonder if we might key this to whether the original graph permits self loops?

lowasser avatar Apr 05 '17 01:04 lowasser

We could certainly key this to whether the original graph permits self-loops; that's easy.

The question of correctness is a bit tricky, though. Transitive closure is determined in terms of reachability (at least one reference calls a transitive closure a "reachability graph"), and the definition of reachability that seems to be the most prevalent IME is "w is reachable from u if there exists a path from u to w". Note that in Guava's definition of Graphs.reachableNodes() we specify that u is reachable from itself (and this seems to be consistent with most definitions AFAICT).

So the question is whether it makes sense to explicitly represent the reachability of u from itself as a self-loop or not.

Some options for resolving this: (1) Make the definition more explicit so people won't be surprised. (Right now the definition incorporates the definition of reachability by reference.) (2) Let the addition of self-loops be contingent on whether the original graph permits them. (3) Add an option (parallel method, boolean parameter, etc.) to allow (or require) the user to explicitly specify whether they want self-loops or not. (One option here: require the user to provide their own GraphBuilder instance.)

There may be other options as well.

Thoughts, anyone?

jrtom avatar Apr 05 '17 01:04 jrtom

I think requiring the user to specify whether they want self-loops or not is best, as it would be more explicit and help to prevent surprises.

Requiring a GraphBuilder rather than a boolean parameter is a cool idea, but I think it may be too verbose in practice. How about an enum class like the following, which would provide more context than just a boolean param?

public enum class SelfLoops {
  ALLOW, DISALLOW
}

jbduncan avatar Apr 08 '17 13:04 jbduncan

Why does the right answer not turn on the standard definition of transitive closure, where a closure is the smallest relation that i. includes the original relation and ii. is transitive? This means that for a directed graph self loops should be included iff a node is reachable from itself along directed path of positive length. (See issue 3187.)

alan-isaac avatar Oct 09 '18 20:10 alan-isaac

@alan-isaac I think that's a reasonable point; I think that it's also worth noting that including all self-loops may tend to conflate "transitive" and "reflexive".

(Although I have one quibble: I don't care what the length of the walk is from u to u, I just care whether it involves at least one edge.)

In any event, I think that the definition you're proposing is more or less where we're likely to end up.

jrtom avatar Oct 10 '18 22:10 jrtom

I don't think I understand the "quibble". Reflexive relations will of course have a shortest directed walk from u to u of length 1. For any element where reflexivity fails but a cycle exists, the shortest walk will be length > 1. Anyway, thank you for your apparent willingness to better align the implementation of transitive_closure with the standard definition for the transitive closure of a directed graph.

alan-isaac avatar Oct 11 '18 02:10 alan-isaac

The quibble is this: if the edges in the graph have weights, and the length of a u-v walk is defined as being the sum of the edge weights in that walk, then the (weighted) length of a u-u path can be less than 1, or even negative, but we'd still want to include the self-loop in the transitive closure.

(For the record, it's taken us a while to resolve this in part because this is one of the many definitional issues in which graph theory has not developed a consensus: some sources assert that a transitive closure should include self-loops for each node.)

jrtom avatar Oct 11 '18 02:10 jrtom

May I have a reference to a source that defines the transitive closure not to be a closure in the usual sense or somehow redefines closure to not require including the original relation. I've looked but have not found one. Are these sources somehow distinguishing between the transitive closure of a relation and its reachability relation, or is the latter also redefined to exclude cycles?

alan-isaac avatar Oct 11 '18 12:10 alan-isaac

@alan-isaac I think that there's a confusion here between two related concepts.

No resource that I've found has suggested that the transitive closure of a relation (graph) should not be a superset of the original relation (graph). (That was the subject of the other bug you linked to, I think.)

The point of controversy here is whether the transitive closure of a graph should always (or even optionally) induce self-loops incident to each node, based on the argument that any u-v walk should induce a u-v edge, including vacuous walks (those with no edges).

When I was investigating this last year, I found the following textbook definitions of transitive closure graphs:

  • one reference that explicitly excludes self-loops based on no-edge vacuous paths [1]
  • two references whose definitions do not exclude such self-loops (and thus implicitly include them) [2, 3]
  • one reference that explicitly includes such self-loops [4]

[1] Udi Manber, Introduction to Algorithms (p. 186) [2] Horowitz, Sahni, Rajasekaran, Computer Algorithms (p. 115) [3] Cormen, Leiserson, Rivest, Introduction to Algorithms (p. 87) [4] Ross and Wright, Discrete Mathematics (p. 115-120)

jrtom avatar Oct 11 '18 18:10 jrtom

OK, I see what you are talking about. But this is a matter of very odd choices of vocabulary. The transitive closure of a binary relation in a set X is by definition the smallest transitive relation in X that includes R. It is common enough to create another closure, often called the reflexive-transitive closure, which is the transitive closure of the reflexive closure. Some authors explicitly say they will call this the transitive closure for short (e.g., Computer Algorthms by Baase and Van Gelder), but they should not. In addition to these, some authors introduce a notion of an irreflexive transitive closure, but this is very bad terminology, since the idea of an irreflexive closure (i.e., superset) is inchoherent.

Thank you for clarifying. In my view, if one wants all self loops in a relation that is not reflexive, one should produce the transitive closure of the reflexive closure. (However, providing an option for this seems fine to me.) For the transitive closure itself, a self loop should always indicate a positive-length cycle (possibly of length 1). To be a proper closure, any self loops in the original relation must be retained. In terms of the discussion above, therefore, the question really cannot be whether to "allow" self loops, since they may be required by a transitive closure. It is instead whether to add self loops (i.e., to create the reflexive transitive closure). Of course, one might offer an option to remove self loops, with the understanding that the result will not be a closure.

alan-isaac avatar Oct 11 '18 19:10 alan-isaac

I expect that most people that want the transitive closure of a graph have never heard of the reflexive property. :) In any event, I agree that the idea of calling the transitive closure of the reflexive closure a transitive closure per se is weird.

You are correct that the question is whether to add self-loops or not. I agree that that can be viewed as a reflexive closure; it can also be viewed as a side effect of a transitive closure in which one declares that for each node u there is always (at least) a zero-edge u-u walk.

Anyway, I think we're on the same page as to what is desirable, and hopefully we'll be able to get this to happen in the right way soon.

jrtom avatar Oct 12 '18 05:10 jrtom

I see this feature is still not implemented, for the reference see python networkx impl https://networkx.org/documentation/stable/reference/algorithms/generated/networkx.algorithms.dag.transitive_closure.html has additional parameter 'reflexive': A reflexive transitive closure creates a self-loop for the path from v to v of length 0. The usual transitive closure creates a self-loop only if a cycle exists (a path from v to v with length > 0)

tadamovsky avatar Oct 23 '23 12:10 tadamovsky

I still agree that we should address this. It's on my list of items to work out with the rest of the Guava team. :)

jrtom avatar Oct 24 '23 05:10 jrtom