pys2index
pys2index copied to clipboard
Raw array based nearest-neighbor lookup
The idea is to rely only on raw array operations to perform nearest-neighbor lookup, i.e., not using any tree-based data structure for the index.
Those complex structures are hard to reuse in distributed settings (e.g., using dask).
s2geometry provide some nice features like region coverings that may optimize the problem (well) beyond brute-force lookup.
While experimenting with this library, I had a similar thought.
I am thinking of a separate Index class, which takes the search radius as additional constructor argument.
Based on the points and radius, it would get the S2Cap instances. Then it would translate the caps into their respective S2RegionCoverer which is equivalent to a configurable number of cell ids. Each cell id of a cap maps back to the original point index.
A simple implementation would store the cell ids and mapping indices in two arrays and sort both according to the cell ids. Than the approximate query boils down to a search in a sorted array to get the result candidates and finally computing and checking the actual distance for those candidates.
Thinking further about the general nearest neighbour lookup... In theory, you need to compute the Voronoi Diagram on the sphere for the given points. The query for a given point checks which tile contains the point. Using S2 as a numerical approximation, you would need to find a way to approximate the Voronoi tiles with disjoint cells. The challenge I see is finding a good balance between precision and a manageable number of cells. Therefore, the max allowed cell id level should depend on the density (or more precisely the distance between the points generating the face) which might not be homogeneous (see for instance https://www.jasondavies.com/maps/voronoi/airports/). Another little bit more complex approach would allow the index cells for each tile to overlap and in case of multiple hits check the distance more carefully.