jts icon indicating copy to clipboard operation
jts copied to clipboard

STRtree.remove() should not use object reference equality

Open airbreather opened this issue 6 years ago • 12 comments

As a user, I would expect STRtree.remove(Envelope itemEnv, Object item) to use the same kind of equality test that List<E>.remove(Object o) uses, so that I can remove items from the tree without having to keep track of the exact same boxes that I originally inserted.

However, STRtree.remove() ultimately uses object reference equality, which leads to the behavior seen in the following program:

import java.util.ArrayList;

import org.locationtech.jts.geom.Envelope;
import org.locationtech.jts.index.strtree.STRtree;

class Program {
    public static void main(String args[]) {
        Envelope env = new Envelope(0, 0, 0, 0);

        // 'long' because remove(int) is a different method.
        for (long i = 0; i < 150; i++) {
            STRtree tree = new STRtree();
            tree.insert(env, i);
            boolean removedFromTree = tree.remove(env, i);

            ArrayList list = new ArrayList();
            list.add(i);
            boolean removedFromList = list.remove(i);

            if (removedFromTree != removedFromList) {
                System.out.println("Discrepancy with i = " + i);
            }
        }
    }
}

Running this yields the following output (javac 13.0.1, Java 13.0.1):

Discrepancy with i = 128
Discrepancy with i = 129
Discrepancy with i = 130
Discrepancy with i = 131
Discrepancy with i = 132
Discrepancy with i = 133
Discrepancy with i = 134
Discrepancy with i = 135
Discrepancy with i = 136
Discrepancy with i = 137
Discrepancy with i = 138
Discrepancy with i = 139
Discrepancy with i = 140
Discrepancy with i = 141
Discrepancy with i = 142
Discrepancy with i = 143
Discrepancy with i = 144
Discrepancy with i = 145
Discrepancy with i = 146
Discrepancy with i = 147
Discrepancy with i = 148
Discrepancy with i = 149

airbreather avatar Nov 21 '19 14:11 airbreather

Good point. The Java Integer cache means that small integers all share the same object, so that's going to cause the problem you are seeing.

Doe List.remove() use Object.equals() as the equality test? And if so does it remove all copies of an object which is equal, or just the first?

The original thinking behind using == was that likely the objects in the tree were going to be complex objects, for which reference equality made the most sense.

That choice also avoided the long-standing and unfortunate confusion between Geometry.equals as originally defined (which follows the OGC naming and uses the (very expensive) "topological equals" semantics), versus the more standard Java semantics of "equalsExact". This has been somewhat mitigated by the provision of Geometry.equals(Object) with standard Java semantics. So perhaps this could be used (which might require some coercion perhaps? Not sure how the overloaded method resolution would work in this context).

The alternative is to keep the current semantics, and warn users that they cannot rely on removal of value types. Is that a major use case?

dr-jts avatar Nov 21 '19 17:11 dr-jts

Doe List.remove() use Object.equals() as the equality test? And if so does it remove all copies of an object which is equal, or just the first?

According to the docs, ignoring nulls, it's the first object for which Object.equals() compares true.

The alternative is to keep the current semantics, and warn users that they cannot rely on removal of value types.

Another alternative could be to allow the owner of the tree to inject an instance of some interface with a boolean equals(Object o1, Object o2) used for this purpose. Default would do an object reference comparison (for backwards compatibility), and callers could opt in to doing something different.

Is that a major use case?

My specific use case was on the NetTopologySuite side of things. Stumbled across it when trying to port the concaveman algorithm to C#, using STRtree<T>'s structure more directly than most applications.

There, the tree contains individual points, and so I thought to just put the array indices into the tree to tighten up the memory layout across the board (it would also allow indexing into "colder" arrays with preprocessed results of trig functions for full spherical distance testing, as needed).

It's not just value types, though: "tree.remove(env, new CoordinateXY(x, y))" is practically never going to work unless the JVM gets some Skynet-level advancements.

airbreather avatar Nov 21 '19 17:11 airbreather

Doe List.remove() use Object.equals() as the equality test? And if so does it remove all copies of an object which is equal, or just the first?

According to the docs, ignoring nulls, it's the first object for which Object.equals() compares true.

Hmm.... "first" makes sense in context of a list, but what would be the semantics for a tree?

Another alternative could be to allow the owner of the tree to inject an instance of some interface with a boolean equals(Object o1, Object o2) used for this purpose. Default would do an object reference comparison (for backwards compatibility), and callers could opt in to doing something different.

Agreed. Or perhaps even simpler, just provide a flag (via a set method) which chooses between the two kinds of equality.

It's not just value types, though: "tree.remove(env, new CoordinateXY(x, y))" is practically never going to work unless the JVM gets some Skynet-level advancements.

Good point.

dr-jts avatar Nov 21 '19 17:11 dr-jts

My specific use case was on the NetTopologySuite side of things. Stumbled across it when trying to port the concaveman algorithm to C#, using STRtree<T>'s structure more directly than most applications.

Cool. Did you see the recent commit of HPRtree ? Would be interesting to know if that is faster. (And I believe Concaveman similarly uses Flatbush for performance?) . Although HPRtree does not yet implement remove.

KdTree might be faster than either, for points - but it doesn't have remove either. PRs gratefully accepted!

dr-jts avatar Nov 21 '19 17:11 dr-jts

Doe List.remove() use Object.equals() as the equality test? And if so does it remove all copies of an object which is equal, or just the first?

According to the docs, ignoring nulls, it's the first object for which Object.equals() compares true.

Hmm.... "first" makes sense in context of a list, but what would be the semantics for a tree?

Reading some more, List.remove(Object o) adds the "first" semantics to the inherited Collection.remove(Object o) method, which is otherwise the same, so it should be fine to just remove any one item that we encounter within the bounds.


My specific use case was on the NetTopologySuite side of things. Stumbled across it when trying to port the concaveman algorithm to C#, using STRtree<T>'s structure more directly than most applications.

Cool. Did you see the recent commit of HPRtree ? Would be interesting to know if that is faster. (And I believe Concaveman similarly uses Flatbush for performance?) . Although HPRtree does not yet implement remove.

The JS implementation of concaveman uses RBush.

I hadn't seen HPRtree yet, but from the wiki page, a Hilbert R-tree sounds like it could perform better in this use case than STR as nodes get deleted, since there's a reasonable way to have it adjust to the new size instead of holding onto the original structure the entire time.

That said, I'm wearing my "proprietary application developer" hat here rather than my "NetTopologySuite library developer" hat, and the PoC implementation that I'm going to present to our team should be fast enough for our use cases, and I can say that the spatial index performance is far from being the bottleneck here, so I'm unfortunately not going to be able to justify the time comparing Hilbert / k-d / STR.

airbreather avatar Nov 22 '19 12:11 airbreather

I hadn't seen HPRtree yet, but from the wiki page, a Hilbert R-tree sounds like it could perform better in this use case than STR as nodes get deleted, since there's a reasonable way to have it adjust to the new size instead of holding onto the original structure the entire time.

Not so sure about that... HPRree uses fixed-size arrays to hold the tree structure, so that would be harder to adjust than removing node objects in STRtree.

That said, I'm wearing my "proprietary application developer" hat here rather than my "NetTopologySuite library developer" hat, and the PoC implementation that I'm going to present to our team should be fast enough for our use cases, and I can say that the spatial index performance is far from being the bottleneck here, so I'm unfortunately not going to be able to justify the time comparing Hilbert / k-d / STR.

Fair enough. Will be interested to hear if you do useHPRtree in some application, and how it works out.

dr-jts avatar Nov 22 '19 15:11 dr-jts

Reading some more, List.remove(Object o) adds the "first" semantics to the inherited Collection.remove(Object o) method, which is otherwise the same, so it should be fine to just remove any one item that we encounter within the bounds.

Sounds good. So my best idea for providing this without breaking API is to add a setUseEquals method with an internal flag. Runner-up would be to add a removeEquals method, or perhaps an overloaded remove with a flag parameter. Thoughts?

dr-jts avatar Nov 22 '19 15:11 dr-jts

I hadn't seen HPRtree yet, but from the wiki page, a Hilbert R-tree sounds like it could perform better in this use case than STR as nodes get deleted, since there's a reasonable way to have it adjust to the new size instead of holding onto the original structure the entire time.

Not so sure about that... HPRree uses fixed-size arrays to hold the tree structure, so that would be harder to adjust than removing node objects in STRtree.

My mistake -- that would be the "P" in "HPRtree".


Reading some more, List.remove(Object o) adds the "first" semantics to the inherited Collection.remove(Object o) method, which is otherwise the same, so it should be fine to just remove any one item that we encounter within the bounds.

Sounds good. So my best idea for providing this without breaking API is to add a setUseEquals method with an internal flag. Runner-up would be to add a removeEquals method, or perhaps an overloaded remove with a flag parameter. Thoughts?

My favorite idea at the moment is similar to the "remove overload with a flag parameter" idea, except instead of a flag*, the new parameter is an object that implements the actual logic for testing equals, as this would provide the most flexibility. RBush actually works the exact same way, so there's a precedent.

  • *I generally don't like boolean flag parameters, especially when callers are extremely likely to pass constant true / false values, since it adds a bit of friction when reading through the code later ("what does this "true" mean?").

Something like setUseOverriddenEqualsDuringRemove(true) / setRemoveUsesReferenceEquality(false) would also solve the problem, of course.

airbreather avatar Nov 22 '19 16:11 airbreather

My favorite idea at the moment is similar to the "remove overload with a flag parameter" idea, except instead of a flag*, the new parameter is an object that implements the actual logic for testing equals, as this would provide the most flexibility. RBush actually works the exact same way, so there's a precedent.

Perhaps passing a Comparable would be the most standard? There's no standard interface for just equals (in Java at least) AFAIK.

Otherwise a new interface is needed... not necessarily a bad thing, but more machinery to write and document.

dr-jts avatar Nov 22 '19 16:11 dr-jts

Otherwise a new interface is needed... not necessarily a bad thing, but more machinery to write and document.

There's always BiPredicate... https://docs.oracle.com/javase/8/docs/api/java/util/function/BiPredicate.html

dbaston avatar Nov 22 '19 16:11 dbaston

Otherwise a new interface is needed... not necessarily a bad thing, but more machinery to write and document.

There's always BiPredicate... https://docs.oracle.com/javase/8/docs/api/java/util/function/BiPredicate.html

Yeah... have been avoiding too much new-fangled stuff in JTS, for whatever reason (makes it harder for porting? Better backward compatibility?). But the functional stuff has a big appeal for sure.

Generics first though!

dr-jts avatar Nov 22 '19 16:11 dr-jts

FWIW, I just noticed that Quadtree is actually already doing what I claim is "the right thing":

https://github.com/locationtech/jts/blob/3fedb393ec02ccb58862c0caca65dfb544863445/modules/core/src/main/java/org/locationtech/jts/index/quadtree/NodeBase.java#L108

airbreather avatar Dec 12 '19 22:12 airbreather