duckdb_spatial
duckdb_spatial copied to clipboard
Spatial join implementation missing for `ST_DWITHIN_SPHEROID` predicate
Efficient spatial joins are a game-changer! Looks like they are not yet implemented for st_dwithin_spheroid, and this implementation would be super useful. A toy example query might be something like: "find all users, with multiplicity, within a certain radius of all stores."
To demonstrate, here's a simple example:
- Create two tables of random points that could be interpreted as on the Earth
install spatial;
load spatial;
create or replace table foo as (select st_point(180 * random() - 90, 360 * random() - 180) as foo_point from range(10000));
create or replace table bar as (select st_point(180 * random() - 90, 360 * random() - 180) as bar_point from range(10000));
- Query 1: "Within" join on flat geometry
select * from foo inner join bar on st_dwithin(foo.foo_point, bar.bar_point, 50);
-
explainshows the join type asSPATIAL_JOIN.explain analyzeshows fast performance:
┌────────────────────────────────────────────────┐
│┌──────────────────────────────────────────────┐│
││ Total Time: 0.519s ││
│└──────────────────────────────────────────────┘│
└────────────────────────────────────────────────┘
┌───────────────────────────┐
│ QUERY │
└─────────────┬─────────────┘
┌─────────────┴─────────────┐
│ EXPLAIN_ANALYZE │
│ ──────────────────── │
│ 0 Rows │
│ (0.00s) │
└─────────────┬─────────────┘
┌─────────────┴─────────────┐
│ EXTENSION │
│ ──────────────────── │
│ Join Type: INNER │
│ │
│ Conditions: │
│ ST_DWithin(foo_point, ├──────────────┐
│ bar_point) │ │
│ │ │
│ 10100910 Rows │ │
│ (0.51s) │ │
└─────────────┬─────────────┘ │
┌─────────────┴─────────────┐┌─────────────┴─────────────┐
│ TABLE_SCAN ││ TABLE_SCAN │
│ ──────────────────── ││ ──────────────────── │
│ Table: foo ││ Table: bar │
│ Type: Sequential Scan ││ Type: Sequential Scan │
│ ││ │
│ Projections: ││ Projections: │
│ foo_point ││ bar_point │
│ ││ │
│ 10000 Rows ││ 10000 Rows │
│ (0.00s) ││ (0.00s) │
└───────────────────────────┘└───────────────────────────┘
- Query 2: "Within" join on spherical geometry (radius chosen to achieve a similar-size result set as the previous query)
select * from foo inner join bar on st_dwithin_spheroid(foo.foo_point, bar.bar_point, 5000000);
-
explainshows the join type asBLOCKWISE_NL_JOIN.explain analyzeshows dramatically slower performance despite the similar-sized result set:
┌────────────────────────────────────────────────┐
│┌──────────────────────────────────────────────┐│
││ Total Time: 85.51s ││
│└──────────────────────────────────────────────┘│
└────────────────────────────────────────────────┘
┌───────────────────────────┐
│ QUERY │
└─────────────┬─────────────┘
┌─────────────┴─────────────┐
│ EXPLAIN_ANALYZE │
│ ──────────────────── │
│ 0 Rows │
│ (0.00s) │
└─────────────┬─────────────┘
┌─────────────┴─────────────┐
│ BLOCKWISE_NL_JOIN │
│ ──────────────────── │
│ Join Type: INNER │
│ │
│ Condition: │
│ ST_DWithin_Spheroid(CAST │
│ (foo_point AS POINT_2D), ├──────────────┐
│ CAST(bar_point AS │ │
│ POINT_2D), 5000000.0) │ │
│ │ │
│ 17163875 Rows │ │
│ (85.49s) │ │
└─────────────┬─────────────┘ │
┌─────────────┴─────────────┐┌─────────────┴─────────────┐
│ TABLE_SCAN ││ TABLE_SCAN │
│ ──────────────────── ││ ──────────────────── │
│ Table: foo ││ Table: bar │
│ Type: Sequential Scan ││ Type: Sequential Scan │
│ ││ │
│ Projections: ││ Projections: │
│ foo_point ││ bar_point │
│ ││ │
│ 10000 Rows ││ 10000 Rows │
│ (0.00s) ││ (0.00s) │
└───────────────────────────┘└───────────────────────────┘
Hello! Thanks for opening this issue. Yes, this is currently expected, spheroid/sphere based joins require a more advanced indexing structure to handle wraparound the earth. But it is definitely something we would like to support eventually.