duckdb_spatial
duckdb_spatial copied to clipboard
Spatial predicates impacts other filter values
Hi Here is a pretty straightforward SQL query :
SELECT d.fidji_valeur_fonciere__integer
FROM dvf.disposition d
WHERE d.fidji_valeur_fonciere__integer < 50000
ORDER BY d.fidji_valeur_fonciere__integer DESC ;
with a pretty straightforward result starting with values at 49,999
If I add a spatial filter (either Intersects, Overlaps or contains) I get different results :
SELECT d.fidji_valeur_fonciere__integer
FROM dvf.disposition d
WHERE d.fidji_valeur_fonciere__integer < 50000
AND ST_Intersects(ST_MakeEnvelope(1.526264074441798, 45.15464025455313, 1.5394900973676613, 45.16297435294794), d.geom)
ORDER BY d.fidji_valeur_fonciere__integer DESC ;
now the results start with a 65500 value, not respecting the d.fidji_valeur_fonciere__integer < 50000 clause. If I remove the d.fidji_valeur_fonciere__integer < 50000 clause, I have even bigger results as expected. But it's like using the spatial function thwarts the other filter from being executed totally....
Hello! I am unable to reproduce this behavior. Here's what I've tried:
-- Create a table with 100 diagonal points
CREATE TABLE t1 as SELECT ST_Point(x, x) as g, x as i FROM range(0,100) r(x);
-- Filter
SELECT * FROM t1 WHERE i > 5 AND st_intersects(g, ST_MakeEnvelope(0, 0, 10, 10)) ORDER BY i DESC;
┌───────────────┬───────┐
│ g │ i │
│ geometry │ int64 │
├───────────────┼───────┤
│ POINT (10 10) │ 10 │
│ POINT (9 9) │ 9 │
│ POINT (8 8) │ 8 │
│ POINT (7 7) │ 7 │
│ POINT (6 6) │ 6 │
└───────────────┴───────┘
This all seems correct to me.
- What version of DuckDB are you using.
- Can you share the dataset you're working with?
- What is the query plan for the "wrong" query? I.E. what is the output if you run
EXPLAIN <your query>?
Hi Max,
Thanks for taking take of my issue. I've been using last version, currently v1.2.1. I can share the dataset but it's big. I'm currently uploading a parquet file of the table on r2 may it may take some time as I am out of the office with a very bad internet connection. Here is the EXPLAIN statement output :
Physical Plan ┌───────────────────────────┐ │ PROJECTION │ │ ──────────────────── │ │__internal_decompress_integ│ │ ral_bigint(#0, 0) │ │ │ │ ~592481 Rows │ └─────────────┬─────────────┘ ┌─────────────┴─────────────┐ │ ORDER_BY │ │ ──────────────────── │ │d.fidji_valeur_fonciere__in│ │ teger DESC │ └─────────────┬─────────────┘ ┌─────────────┴─────────────┐ │ PROJECTION │ │ ──────────────────── │ │__internal_compress_integra│ │ l_usmallint(#0, 0) │ │ │ │ ~592481 Rows │ └─────────────┬─────────────┘ ┌─────────────┴─────────────┐ │ PROJECTION │ │ ──────────────────── │ │ #0 │ │ │ │ ~592481 Rows │ └─────────────┬─────────────┘ ┌─────────────┴─────────────┐ │ FILTER │ │ ──────────────────── │ │ ST_Intersects('\x02\x04 │ │ \x00\x00\x00\x00\x00\x00 │ │ \x9F\x5C\xC3?Z\x9E4B\x03 │ │ \x0E\xC5?\xE3\xA64B\x02 │ │ \x00\x00\x00\x01\x00\x00 │ │ \x00\x05\x00\x00\x00\x00 │ │ \x00\x00\x00\x80\x98\xCC │ │ \xE0\x93k\xF8?\xB4\xF9y@ │ │ \xCB\x93F@\x80\x98\xCC\xE0│ │ \x93k\xF8?\xC4\xFF\xF5W │ │ \xDC\x94F@\x00\x5CK^\xC0 │ │ \xA1\xF8?\xC4\xFF\xF5W\xDC│ │ \x94F@\x00\x5CK^\xC0\xA1 │ │ \xF8?\xB4\xF9y@\xCB\x93F@ │ │ \x80\x98\xCC\xE0\x93k\xF8?│ │ \xB4\xF9y@\xCB\x93F@': │ │ :GEOMETRY, geom) │ │ │ │ ~592481 Rows │ └─────────────┬─────────────┘ ┌─────────────┴─────────────┐ │ FILTER │ │ ──────────────────── │ │ (#0 < 50000) │ │ │ │ ~14812040 Rows │ └─────────────┬─────────────┘ ┌─────────────┴─────────────┐ │ RTREE_INDEX_SCAN │ │ ──────────────────── │ │ Table: disposition │ │ │ │ Index: │ │ idx_disposition_geom │ │ │ │ Projections: │ │fidji_valeur_fonciere__inte│ │ ger │ │ geom │ │ │ │ ~14812040 Rows │ └───────────────────────────┘
Thanks Guillaume
As I just ran the queries again to provide the output I don't see the problem happening again. don't bother do anything right now I'll check into it deeper later this week
Cheers