jts
jts copied to clipboard
Change Geometry.equals(Geometry) to have equalsExact semantics
A long-standing confusion in the JTS Geometry class is that Geometry.equals(Geometry) has the semantics of topological-equality, rather than exact-equality (which is more usual for Java). This impacts using JTS in collections or frameworks which rely on equals having standard Java semantics (for example, ORMs such as JPA). In most use cases for collections and containers the desire is for exact-equality semantics.
Even more confusingly, to mitigate this the method Geometry.equals(Object) was added, which has exact-equality semantics. This helps ensure that some container classes "do the right thing" of using the faster exact-equality, but it's hard to be sure they are doing this without close inspection.
Geometry.equals(Geometry) should be changed to have exact-equality semantics.
To compute topological-equality the Geometry.equalsTopo(Geometry) method can be used. The use cases for topological-equality seem to be fairly rare, so having a slightly more verbose method name should not be a problem. This causes a slight discrepancy with the standard OGC Spatial Predicate definition, but the benefits of comprehension and safety outweigh this.
When this change is made the existing Geometry.equalsExact(Geometry) method will be redundant and can be deprecated. EDIT: Geometry.equalsExact(Geometry) will be kept, since the name clearly indicates the semantics. equals will delegate to it.
NOTE that this is definitely a breaking API change.
NOTE 2: As pointed out below, this does not affect the implentation of hashCode(), which is currently appropriate for either definition of equality.
Wouldn't the more painful part be that Geometry.equals(Geometry) uses the semantics of equalsTopo, whereas Geometry.equals(object) uses the semantics of equalsExact?
i.e., assuming JTS works the same as NTS, you ought to wind up with this apparent inconsistency, which seems more confusing (to me) than the exact truth value on either line (especially in real-world scenarios where o2 is only typed as Object by virtue of it being a key in a hash-based collection):
static void tst()
{
Geometry g1 = createNewGeom();
Geometry g2 = createSimilarGeom(); /* such that g1.equalsTopo(g2) && !g1.equalsExact(g2) */
Object o2 = g2;
g1.equals(g2); /* true */
g1.equals(o2); /* false */
}
Also, rather than being impossible*, hashCode() is already correct for both topological and "exact" equality: it will always return the same value for two topologically equal instances, therefore it will always return that same value for two "exactly" equal instances. Admittedly, you could invite a "better"** (correct) hashCode() implementation (i.e., likely fewer collisions) by restricting the semantics of Geometry.equals(Object) to "exact" equality, but just using the extents is sufficient.
*In fact, you can always trivially make a correct hashCode() implementation by simply doing return 0;... all implementations of that method which are more complicated than that are (knowingly or otherwise) voluntarily introducing that extra complexity in exchange for improving performance of hash-based collections that key off of that object, not because it's necessary for correctness.
**This actually wouldn't be unambiguously "better" overall, though: in many real-world cases, I expect that getEnvelopeInternal() will already be cached, so it's blindingly fast today... making it slower by, e.g., scanning the entire coordinate data (which I imagine is the most likely alternative) doesn't seem likely to eliminate enough collisions in "interesting" real-world scenarios to pay for itself. YMMV.
Also, rather than being impossible*, hashCode() is already correct for both topological and "exact" equality
Yes, correct. My error. So this is not a driver for this change. Updated the issue to reflect this.
And agreed that the current hashcode based on extent is a good balance of performance and selectivity.
Wouldn't the more painful part be that Geometry.equals(Geometry) uses the semantics of equalsTopo, whereas Geometry.equals(object) uses the semantics of equalsExact?
Yes, exactly. That's why this is a breaking change.
The example you give is why this change would improve the API, by removing the un-obvious difference between equals(Geometry) and equals(Object). Or am I missing your point?
The example you give is why this change would improve the API, by removing the un-obvious difference between
equals(Geometry)andequals(Object). Or am I missing your point?
I'm missing something in general, I guess.
The way the issue is described, it sounds like the implementation of Geometry.equals(Geometry) needs to change (because of the two methods that "Geometry.equals" could refer to, this is the only one that hasn't been implemented that way for a while now).
But then as it goes on, it justifies itself as though the method to be changed is Geometry.equals(Object) ("frameworks which rely on equals having standard Java semantics" and the (edit: now-deleted) note about hashCode()).
The description doesn't (and definitely didn't) read like "change equals(Geometry) to behave the same way that equals(Object) does", which is why I chose to ask for clarification.
Good point - I will clarify the issue description.
Please do this change, or implement equalsTopo support for GeometryCollection. The current implemtation throws a IllegalArgumentException on instances of GeometryCollection.
Please do this change, or implement equalsTopo support for GeometryCollection. The current implemtation throws a IllegalArgumentException on instances of GeometryCollection.
The reason there is no equalsTopo support (and in fact relate in general) for GeometryCollection is that it's too hard to compute the exact topology in certain situations (in particular, lines and polygons all crossing at non-vertices). So this won't happen, at least until some better algorithm comes along.
Will have to think about the effects of this breaking change before implementing it.
Just stumbled into this by doing some PMD related cleanups in GeoTools.
One of the PMD rules for tests is not to use assertTrue when assertEquals is a logical equivalent, e.g.:
assertEquals(a,b)
is better than:
assertTrue(a.equals(b))
as assertEquals has a more descriptive failure message, showing the two values, while assertTrue gives nothing unless a message is explicitly provided.
Now, doing the migration, with a and b being geometries, ended up changing the semantics of the comparison, from equalsTopo to equalsExact, because the assertEquals method in JUnit uses Object as argument, ends up bidinding the "Geometry.equals(Object)`` implementation. I know the reason why it happens, but still find it confusing.
@aaime yes, that's the kind of crazy-making thing that happens with the current API semantics. I take it then that you would be in favour of this change?
@aaime actually it seems to me to be suspect that the topological equality would be used in a unit test. It is a relatively weak condition, which might mask certain kinds of errors. And in a unit test it should be possible to be more precise about what the expected value is.
@airbreather Are you in favour of this change?
@airbreather Are you in favour of this change?
Hm. I am in favor of removing the inconsistency that I posted earlier, and if we have to change the semantics of one, then Geometry.equals(Geometry) seems to be the right one to change, even despite the OGC discrepancy.
I wonder if it might not make more sense to deprecate and start working towards removing Geometry.equals(Geometry) entirely? After all, if it's going to work exactly the same as Geometry.equals(object), then I don't see much of a reason to have a separate method for it other than backwards compatibility.
When this change is made the existing
Geometry.equalsExact(Geometry)method will be redundant and can be deprecated.
Personally, I prefer the equalsExact spelling, since it makes it clear how the equality check would be carried out.
@airbreather Are you in favour of this change?
Hm. I am in favor of removing the inconsistency that I posted earlier, and if we have to change the semantics of one, then
Geometry.equals(Geometry)seems to be the right one to change, even despite the OGC discrepancy.
Good.
I wonder if it might not make more sense to deprecate and start working towards removing
Geometry.equals(Geometry)entirely? After all, if it's going to work exactly the same asGeometry.equals(object), then I don't see much of a reason to have a separate method for it other than backwards compatibility.
Agreed. Start with deprecation, then remove.
When this change is made the existing
Geometry.equalsExact(Geometry)method will be redundant and can be deprecated.Personally, I prefer the
equalsExactspelling, since it makes it clear how the equality check would be carried out.
Agreed. Will keep equalsExact, since it is clear about what it means. equals will delegate to it.
I'm in agreement with any move that resolves the inconsistencies, the current situation is hard explain and reason about. The test that failed after migration from assertTrue to assertEquals really needed the equalsTopo behavior, but we can just update the tests to explicitly use that method when the chanage in JTS happens.
The test that failed after migration from assertTrue to assertEquals really needed the equalsTopo behavior, but we can just update the tests to explicitly use that method when the chanage in JTS happens.
Why not update it now, so that it works?
Cause I already have spent enough time on this. Those upgrading JTS in GeoTools will have a reason to change the tests (they are not many, I'm not setting a major roadblock in front of them).
@aaime thanks for pointing that out! Hopefully @jodygarnett and I will remember when the time comes!