cloudberry icon indicating copy to clipboard operation
cloudberry copied to clipboard

[Bug] The result of query with geometric operators(polygon type) always be wrong

Open jiaqizho opened this issue 4 months ago • 2 comments

Apache Cloudberry version

main

What happened

sql - create POINT_TBL

CREATE TABLE POINT_TBL(f1 point);
INSERT INTO POINT_TBL(f1) VALUES ('(0.0,0.0)');
INSERT INTO POINT_TBL(f1) VALUES ('(-10.0,0.0)');
INSERT INTO POINT_TBL(f1) VALUES ('(-3.0,4.0)');
INSERT INTO POINT_TBL(f1) VALUES ('(5.1, 34.5)');
INSERT INTO POINT_TBL(f1) VALUES ('(-5.0,-12.0)');
INSERT INTO POINT_TBL(f1) VALUES ('(1e-300,-1e-300)');
INSERT INTO POINT_TBL(f1) VALUES ('(1e+300,Inf)');
INSERT INTO POINT_TBL(f1) VALUES ('(Inf,1e+300)');
INSERT INTO POINT_TBL(f1) VALUES (' ( Nan , NaN ) ');
INSERT INTO POINT_TBL(f1) VALUES ('10.0,10.0');
INSERT INTO POINT_TBL(f1) VALUES (NULL);
SELECT * FROM POINT_TBL;

and the query with polygon expression

SELECT count(*) FROM point_tbl WHERE f1 <@ polygon '(0,0),(0,100),(100,100),(50,50),(100,0),(0,0)';

the result of SeqScan and IndexOnlyScan/IndexScan is different.

postgres=# set enable_indexscan to off;
SET
postgres=# set enable_indexonlyscan to off;
SET
postgres=# set enable_seqscan to on;
SET
postgres=# explain SELECT count(*) FROM point_tbl WHERE f1 <@ polygon '(0,0),(0,100),(100,100),(50,50),(100,0),(0,0)';
                                        QUERY PLAN
------------------------------------------------------------------------------------------
 Aggregate  (cost=1.07..1.08 rows=1 width=8)
   ->  Gather Motion 3:1  (slice1; segments: 3)  (cost=0.00..1.07 rows=1 width=0)
         ->  Seq Scan on point_tbl  (cost=0.00..1.05 rows=1 width=0)
               Filter: (f1 <@ '((0,0),(0,100),(100,100),(50,50),(100,0),(0,0))'::polygon)
 Optimizer: Postgres query optimizer
(5 rows)

postgres=# SELECT count(*) FROM point_tbl WHERE f1 <@ polygon '(0,0),(0,100),(100,100),(50,50),(100,0),(0,0)';
 count
-------
     5
(1 row)

postgres=#
postgres=# set enable_seqscan to off;
SET
postgres=# set enable_indexscan to on;
SET
postgres=# set enable_indexonlyscan to on;
SET
postgres=# explain SELECT count(*) FROM point_tbl WHERE f1 <@ polygon '(0,0),(0,100),(100,100),(50,50),(100,0),(0,0)';
                                          QUERY PLAN
----------------------------------------------------------------------------------------------
 Aggregate  (cost=8.17..8.18 rows=1 width=8)
   ->  Gather Motion 3:1  (slice1; segments: 3)  (cost=0.13..8.17 rows=1 width=0)
         ->  Index Only Scan using gpointind on point_tbl  (cost=0.13..8.15 rows=1 width=0)
               Index Cond: (f1 <@ '((0,0),(0,100),(100,100),(50,50),(100,0),(0,0))'::polygon)
 Optimizer: Postgres query optimizer
(5 rows)

postgres=# SELECT count(*) FROM point_tbl WHERE f1 <@ polygon '(0,0),(0,100),(100,100),(50,50),(100,0),(0,0)';
 count
-------
     4
(1 row)

Also i found another problem: The point ((1e-300,-1e-300)) always in the result.

postgres=# set enable_indexscan to off;
SET
postgres=# set enable_indexonlyscan to off;
SET
postgres=# set enable_seqscan to on;
SET
postgres=# SELECT * FROM point_tbl WHERE f1 <@ polygon '(0,0),(0,100),(100,100),(50,50),(100,0),(0,0)';
        f1
------------------
 (1e-300,-1e-300)
 (NaN,NaN)
 (0,0)
 (5.1,34.5)
 (10,10)
(5 rows)

postgres=# set enable_seqscan to off;
SET
postgres=# set enable_indexscan to on;
SET
postgres=# set enable_indexonlyscan to on;
SET
postgres=# SELECT * FROM point_tbl WHERE f1 <@ polygon '(0,0),(0,100),(100,100),(50,50),(100,0),(0,0)';
        f1
------------------
 (0,0)
 (5.1,34.5)
 (10,10)
 (1e-300,-1e-300)
(4 rows)

i guess the polygon looks like(not sure):

y
↑
| (0,100)  *───────────────────────* (100,100)
|          │                     /
|          │                    / 
|          │                   /
|          │                  /
|          │                 /
|          │                /
|          │               /
|          │      (50,50) *
|          │               \
|          │                \
|          │                 \
|          │                  \
|          │                   \
|          │                    \
|          │                     \
| (0,0) *──┼───────────────────────* (100,0)
|
|
└───────────────────────────────────> x

And the point (1e-300,-1e-300) should be left of the line ((0,0) , (0,100)), because its Y(-1e-300) is a neg value.

What you think should happen instead

No response

How to reproduce

nope

Operating System

all

Anything else

No response

Are you willing to submit PR?

  • [ ] Yes, I am willing to submit a PR!

Code of Conduct

jiaqizho avatar Aug 01 '25 12:08 jiaqizho

Hi, @jiaqizho welcome!🎊 Thanks for taking the time to point this out.🙌

github-actions[bot] avatar Aug 01 '25 12:08 github-actions[bot]

Interesting. I test this on Postgres, it returned 5 rows too if disable using indexes including bitmap.

explain SELECT * FROM point_tbl WHERE f1 <@ polygon '(0,0),(0,100),(100,100),(50,50),(100,0),(0,0)';
                                    QUERY PLAN
----------------------------------------------------------------------------------
 Index Only Scan using gpointind on point_tbl  (cost=0.13..8.15 rows=1 width=16)
   Index Cond: (f1 <@ '((0,0),(0,100),(100,100),(50,50),(100,0),(0,0))'::polygon)
(2 rows)



SELECT * FROM point_tbl WHERE f1 <@ polygon '(0,0),(0,100),(100,100),(50,50),(100,0),(0,0)';
        f1
------------------
 (1e-300,-1e-300)
 (0,0)
 (10,10)
 (5.1,34.5)
(4 rows)

set enable_indexonlyscan = off;
set enable_indexscan = off;
set enable_bitmapscan = off;

explain(costs off, verbose) SELECT * FROM point_tbl WHERE f1 <@ polygon '(0,0),(0,100),(100,100),(50,50),(100,0),(0,0)';
                                       QUERY PLAN
----------------------------------------------------------------------------------------
 Seq Scan on tpcds.point_tbl
   Disabled: true
   Output: f1
   Filter: (point_tbl.f1 <@ '((0,0),(0,100),(100,100),(50,50),(100,0),(0,0))'::polygon)
(4 rows)

SELECT * FROM point_tbl WHERE f1 <@ polygon '(0,0),(0,100),(100,100),(50,50),(100,0),(0,0)';
        f1
------------------
 (0,0)
 (5.1,34.5)
 (1e-300,-1e-300)
 (NaN,NaN)
 (10,10)
(5 rows


avamingli avatar Aug 04 '25 06:08 avamingli