activerecord-postgis-adapter
activerecord-postgis-adapter copied to clipboard
Invalid interpretation of where.not clause
As I cannot find that our bug is intended for this gem, I am opening the issue.
In one place we need to search data by NOT matching specific GPS point. When using SQL condition, it works, but query is corrupted when using ActiveRecord.
some schema information
create_table "orders", force: :cascade do |t|
...
t.geography "gps", limit: {:srid=>4326, :type=>"st_point", :geographic=>true}
...
end
Let's take a look at the following snippet with SQL statement.
order = Order.find(some_id)
Order.where(id: order.id).where("gps = ST_SetSRID( ST_Point( #{order.gps.x}, #{order.gps.y}), 4326)").explain
=>
EXPLAIN for: SELECT "orders".* FROM "orders" WHERE "orders"."id" = $1 AND (gps = ST_SetSRID( ST_Point( 14.8569972, 50.8526169), 4326)) [["id", 12009736]]
QUERY PLAN
-------------------------------------------------------------------------------------------------
Index Scan using orders_id_company_branch_id_idx on orders (cost=0.43..2.66 rows=1 width=1756)
Index Cond: (id = '12009736'::bigint)
Filter: (gps = '0101000020E6100000F3864556C8B62D402AC2F28C226D4940'::geography)
(3 rows)
Order.where(id: order.id).where.not("gps = ST_SetSRID( ST_Point( #{order.gps.x}, #{order.gps.y}), 4326)").explain
=>
EXPLAIN for: SELECT "orders".* FROM "orders" WHERE "orders"."id" = $1 AND NOT (gps = ST_SetSRID( ST_Point( 14.8569972, 50.8526169), 4326)) [["id", 12009736]]
QUERY PLAN
-------------------------------------------------------------------------------------------------
Index Scan using orders_id_company_branch_id_idx on orders (cost=0.43..2.66 rows=1 width=1756)
Index Cond: (id = '12009736'::bigint)
Filter: (NOT (gps = '0101000020E6100000F3864556C8B62D402AC2F28C226D4940'::geography))
(3 rows)
But when we use ActiveRecord referencing order.gps
, data processed by DB are corrupted using where.not
condition.
order = Order.find(some_id)
Order.where(id: order.id).where(gps: order.gps).explain
=>
EXPLAIN for: SELECT "orders".* FROM "orders" WHERE "orders"."id" = $1 AND "orders"."gps" = $2 [["id", 12009736], ["gps", "0020000001000010e6402db6c8564586f340496d228cf2c22a"]]
QUERY PLAN
-------------------------------------------------------------------------------------------------
Index Scan using orders_id_company_branch_id_idx on orders (cost=0.43..2.66 rows=1 width=1756)
Index Cond: (id = '12009736'::bigint)
Filter: (gps = '0101000020E6100000F3864556C8B62D402AC2F28C226D4940'::geography)
(3 rows)
Order.where(id: order.id).where.not(gps: order.gps).explain
=>
EXPLAIN for: SELECT "orders".* FROM "orders" WHERE "orders"."id" = $1 AND "orders"."gps" != $2 [["id", 12009736], ["gps", "0020000001000010e6402db6c8564586f340496d228cf2c22a"]]
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------
Index Scan using orders_id_company_branch_id_idx on orders (cost=0.43..2.66 rows=1 width=1756)
Index Cond: (id = '12009736'::bigint)
Filter: ((gps)::bytea <> '\x3030323030303030303130303030313065363430326462366338353634353836663334303439366432323863663263323261'::bytea)
(3 rows)
The result of this behavior is that each statement using AR is returning the order
object. It seems to be wrong to me. Rails passes correctly serialized GPS point (0020000001000010e6402db6c8564586f340496d228cf2c22a
) as filter condition is same as in case using ST_Point
, but where.not
looks like it performs some extra encoding of data.
Side note. IIRC Order
as a rails model seems OK, but if you get meta at all with your code (e.g. composing complex queries with Arel) you can end up with some very confusing scenarious while debugging etc. YRMV.
Thanks for the thorough examples. I'll start looking into this soon and post any updates here.
I dug into this some more and was able to replicate it. I also tried it with a GEOS backed factory and got a similar but slightly different result.
EXPLAIN for: SELECT "spatial_models".* FROM "spatial_models" WHERE "spatial_models"."latlon" != $1 [["latlon", "002000000100000ec93ff00000000000003ff0000000000000"]]
QUERY PLAN
------------------------------------------------------------------------------------------
Seq Scan on spatial_models (cost=0.00..77.38 rows=488 width=136)
Filter: ((latlon)::text <> '002000000100000ec93ff00000000000003ff0000000000000'::text)
For some reason it's trying to cast it to text
instead of bytea
this time. Maybe it has something to do with how geography is encoded vs geometry columns.
The thing that seems strange is that the query looks ok to me, but the planner decides that it should try to compare text instead of using geography/geometry types when the !=
comparison is added.
In fact, I can do the same thing by querying my database directly and when using !=
for comparison, I see that the planner wants to cast it to text
again and for =
it casts the wkb to a geometry
.
When I force the wkb
to be cast to a geometry
like so:
EXPLAIN SELECT id FROM spatial_models WHERE latlon != '002000000100000ec93ff00000000000003ff0000000000000'::geometry;
I get this error:
ERROR: operator is not unique: geometry <> geometry
HINT: Could not choose a best candidate operator. You might need to add explicit type casts.
So it seems this is some sort of implementation issue in PostGIS. I'll look more into their docs to see what they say about it. A potential solution could be to override the default where query when an RGeo node is detected and prefer ST_Equals
instead, but that has other implications because that checks for being spatially equal which is different than just using the =
operator.