bdit_data-sources
bdit_data-sources copied to clipboard
We need better documentation for the partitioned tables in `gis_core`
How should people be using them? Can we have a simple way of accessing the latest data? A Mat View maybe?
Here's a quick potential solution I came up with using information_schema and execute_format.
CREATE OR REPLACE FUNCTION gwolofs.centreline_latest()
RETURNS TABLE (
version_date date,
centreline_id integer,
linear_name_id integer,
linear_name_full text,
linear_name_full_legal text,
address_l text,
address_r text,
parity_l text,
parity_r text,
lo_num_l integer,
hi_num_l integer,
lo_num_r integer,
hi_num_r integer,
begin_addr_point_id_l integer,
end_addr_point_id_l integer,
begin_addr_point_id_r integer,
end_addr_point_id_r integer,
begin_addr_l integer,
end_addr_l integer,
begin_addr_r integer,
end_addr_r integer,
linear_name text,
linear_name_type text,
linear_name_dir text,
linear_name_desc text,
linear_name_label text,
from_intersection_id integer,
to_intersection_id integer,
oneway_dir_code integer,
oneway_dir_code_desc text,
feature_code integer,
feature_code_desc text,
jurisdiction text,
centreline_status text,
shape_length numeric,
objectid integer,
shape_len numeric,
mi_prinx integer,
low_num_odd integer,
high_num_odd integer,
low_num_even integer,
high_num_even integer,
geom geometry
)
LANGUAGE plpgsql AS
$func$
DECLARE
partition_name TEXT := (
SELECT table_name
FROM information_schema.tables
WHERE
table_schema = 'gis_core'
AND table_name LIKE 'centreline_20%'
ORDER BY table_name DESC
LIMIT 1
);
BEGIN
RETURN QUERY
EXECUTE FORMAT(
$$SELECT version_date, centreline_id, linear_name_id, linear_name_full, linear_name_full_legal, address_l, address_r, parity_l, parity_r, lo_num_l, hi_num_l, lo_num_r, hi_num_r, begin_addr_point_id_l, end_addr_point_id_l, begin_addr_point_id_r, end_addr_point_id_r, begin_addr_l, end_addr_l, begin_addr_r, end_addr_r, linear_name, linear_name_type, linear_name_dir, linear_name_desc, linear_name_label, from_intersection_id, to_intersection_id, oneway_dir_code, oneway_dir_code_desc, feature_code, feature_code_desc, jurisdiction, centreline_status, shape_length, objectid, shape_len, mi_prinx, low_num_odd, high_num_odd, low_num_even, high_num_even, geom
FROM gis_core.%s
$$,
partition_name);
END
$func$;
--usage example
SELECT version_date, geom
FROM gwolofs.centreline_latest()
WHERE linear_name_full = 'Lake Shore Blvd W';
--same performance when selecting from partition directly:
SELECT version_date, geom
FROM gis_Core.centreline_20231001
WHERE linear_name_full = 'Lake Shore Blvd W';
Wondering if something like this is a simpler solution to the problem of finding the latest data.
CREATE OR REPLACE VIEW gis_core.intersection_latest AS
SELECT *
FROM gis_core.intersection
WHERE version_date = (SELECT MAX(version_date) FROM gis_core.intersection)
@gabrielwol a function might be quicker, but personally I would probably never think to look for data in a function. I would look at the tables and views. The solution above only takes about a fifth of a second to scan the table.
I stand corrected, your view seems to be just as fast as the function I created and much less complex!
I've created a branch for this and will slowly work on it in parallel with my HIN work. Please feel free to review, contribute or correct anything in there. A lot of my understanding of the centreline data at this point is educated guesswork.
I created a new directional routing function that uses the max(version_date) centreline table in gis_core
last week, noting this here so I will rmb to add it in the doc :meow_dio:
Was curious to see the performance between a view and a mat view with more data, so I did a test on what our centreline table would look like in the year 2️⃣ 0️⃣ 2️⃣ 7️⃣
Both view and table are created using max(version_date):
SELECT *
FROM natalie.centreline_chonk
WHERE version_date = (SELECT MAX(version_date) FROM gis_core.centreline)
Selecting and joining
Mat view is a bit faster
-- 155 msec.
select * from natalie.centreline_latest_mat
inner join miovision_api.centreline_miovision using (centreline_id)
-- 298 msec.
select * from natalie.centreline_latest
inner join miovision_api.centreline_miovision using (centreline_id)
Routing
Created routing function based on each view and mat view, the performance of routing 15 sets of OD: Mat view is significantly faster.
with from_to(node_s, node_e) AS (
values
(13457965,13465931),(13465876,13467505),(13457965,13467253),
(13467253,13465931),(13457965,13467124),(13467124,13465931),(13457965,13467505),
(13467040,13465931),(13467124,13467040),(13467505,13465931),(13467505,13465931),
(13465931,13465931),(13467253,13457965),(13467505,13457965),(13467253,13465931))
select * from from_to
left join lateral (select * from natalie.get_centreline_btwn_intersections(node_s, node_e))a ON true
The view returned in
-- natalie chonk view
16 secs 51 msec.
The mat view returned in
-- natalie chonk mat
6 secs 412 msec.
Simple spatial join
Mat view is just a little bit faster than a view.
-- 377 msec.
SELECT area_name, count(1)
FROM natalie.centreline_latest_mat cent
inner join gis.neighbourhood nei on ST_intersects(cent.geom, nei.geom)
group by area_name
-- 457 msec.
SELECT area_name, count(1)
FROM natalie.centreline_latest cent
inner join gis.neighbourhood nei on ST_intersects(cent.geom, nei.geom)
group by area_name
Would expect a little more difference in performance when doing spatial join, especially with an index but apparently it doesn't super matter. However, since there is a significant improvement in efficiency when routing, I would suggest a mat view instead of a function or a view.