simplefeatures icon indicating copy to clipboard operation
simplefeatures copied to clipboard

DCEL Optimisation: consider alternate nodeSet data structure

Open peterstace opened this issue 4 years ago • 1 comments

The purpose of the nodeSet data structure is to keep track of XY values in the DCEL. As new XYs are inserted, it searches for existing XYs that are too close together (and merges them if they are).

It is currently implemented as a map from an (int,int) bucket to an XY. When an insertOrGet operation occurs, we look in 9 separate buckets to see if what we're looking for already exists. While this is constant time, it's not particularly fast (since we're doing 9 separate map access operations). The main advantage of the current nodeSet implementation is that it is incredibly simple (both conceptually simple only a few dozen lines of code).

Right now, nodeSet operations take up about 25% of the CPU time needed for the intersection benchmark test.

This task is to experiment with alternate data structures that may be more performant. Potentially the RTree could be tried. Although, because we only have an XY dataset, we could also consider other data structures such as k-d trees.

peterstace avatar Dec 02 '20 18:12 peterstace

Note that a slightly different map structure is now used (see https://github.com/peterstace/simplefeatures/pull/324 for details). I still suspect that a k-d tree would be faster.

peterstace avatar Dec 31 '20 23:12 peterstace