bdit_data-sources icon indicating copy to clipboard operation
bdit_data-sources copied to clipboard

We need better documentation for the partitioned tables in `gis_core`

Open radumas opened this issue 1 year ago • 6 comments

How should people be using them? Can we have a simple way of accessing the latest data? A Mat View maybe?

radumas avatar Dec 20 '23 03:12 radumas

Here's a quick potential solution I came up with using information_schema and execute_format.

CREATE OR REPLACE FUNCTION gwolofs.centreline_latest()
RETURNS TABLE (
    version_date date,
    centreline_id integer,
    linear_name_id integer,
    linear_name_full text,
    linear_name_full_legal text,
    address_l text,
    address_r text,
    parity_l text,
    parity_r text,
    lo_num_l integer,
    hi_num_l integer,
    lo_num_r integer,
    hi_num_r integer,
    begin_addr_point_id_l integer,
    end_addr_point_id_l integer,
    begin_addr_point_id_r integer,
    end_addr_point_id_r integer,
    begin_addr_l integer,
    end_addr_l integer,
    begin_addr_r integer,
    end_addr_r integer,
    linear_name text,
    linear_name_type text,
    linear_name_dir text,
    linear_name_desc text,
    linear_name_label text,
    from_intersection_id integer,
    to_intersection_id integer,
    oneway_dir_code integer,
    oneway_dir_code_desc text,
    feature_code integer,
    feature_code_desc text,
    jurisdiction text,
    centreline_status text,
    shape_length numeric,
    objectid integer,
    shape_len numeric,
    mi_prinx integer,
    low_num_odd integer,
    high_num_odd integer,
    low_num_even integer,
    high_num_even integer,
    geom geometry
)
LANGUAGE plpgsql AS
$func$

DECLARE
    partition_name TEXT := (
        SELECT table_name
        FROM information_schema.tables
        WHERE
            table_schema = 'gis_core'
            AND table_name LIKE 'centreline_20%'
        ORDER BY table_name DESC
        LIMIT 1
    );

BEGIN
   RETURN QUERY
EXECUTE FORMAT(
        $$SELECT version_date, centreline_id, linear_name_id, linear_name_full, linear_name_full_legal, address_l, address_r, parity_l, parity_r, lo_num_l, hi_num_l, lo_num_r, hi_num_r, begin_addr_point_id_l, end_addr_point_id_l, begin_addr_point_id_r, end_addr_point_id_r, begin_addr_l, end_addr_l, begin_addr_r, end_addr_r, linear_name, linear_name_type, linear_name_dir, linear_name_desc, linear_name_label, from_intersection_id, to_intersection_id, oneway_dir_code, oneway_dir_code_desc, feature_code, feature_code_desc, jurisdiction, centreline_status, shape_length, objectid, shape_len, mi_prinx, low_num_odd, high_num_odd, low_num_even, high_num_even, geom
          FROM gis_core.%s
        $$,
          partition_name);
END
$func$;

--usage example
SELECT version_date, geom
FROM gwolofs.centreline_latest()
WHERE linear_name_full = 'Lake Shore Blvd W';

--same performance when selecting from partition directly:
SELECT version_date, geom
FROM gis_Core.centreline_20231001
WHERE linear_name_full = 'Lake Shore Blvd W';

gabrielwol avatar Dec 20 '23 15:12 gabrielwol

Wondering if something like this is a simpler solution to the problem of finding the latest data.

CREATE OR REPLACE VIEW gis_core.intersection_latest AS

SELECT *
FROM gis_core.intersection
WHERE version_date = (SELECT MAX(version_date) FROM gis_core.intersection)

@gabrielwol a function might be quicker, but personally I would probably never think to look for data in a function. I would look at the tables and views. The solution above only takes about a fifth of a second to scan the table.

Nate-Wessel avatar Jan 03 '24 16:01 Nate-Wessel

I stand corrected, your view seems to be just as fast as the function I created and much less complex!

gabrielwol avatar Jan 03 '24 17:01 gabrielwol

I've created a branch for this and will slowly work on it in parallel with my HIN work. Please feel free to review, contribute or correct anything in there. A lot of my understanding of the centreline data at this point is educated guesswork.

Nate-Wessel avatar Jan 26 '24 14:01 Nate-Wessel

I created a new directional routing function that uses the max(version_date) centreline table in gis_core last week, noting this here so I will rmb to add it in the doc :meow_dio:

chmnata avatar Jan 29 '24 16:01 chmnata

Was curious to see the performance between a view and a mat view with more data, so I did a test on what our centreline table would look like in the year 2️⃣ 0️⃣ 2️⃣ 7️⃣

Both view and table are created using max(version_date):

SELECT *
FROM natalie.centreline_chonk 
WHERE version_date = (SELECT MAX(version_date) FROM gis_core.centreline)

Selecting and joining

Mat view is a bit faster

-- 155 msec.
select * from natalie.centreline_latest_mat
inner join miovision_api.centreline_miovision using (centreline_id)
-- 298 msec.
select * from natalie.centreline_latest
inner join miovision_api.centreline_miovision using (centreline_id)

Routing

Created routing function based on each view and mat view, the performance of routing 15 sets of OD: Mat view is significantly faster.

with from_to(node_s, node_e) AS (
values 
	(13457965,13465931),(13465876,13467505),(13457965,13467253),
	(13467253,13465931),(13457965,13467124),(13467124,13465931),(13457965,13467505),
	(13467040,13465931),(13467124,13467040),(13467505,13465931),(13467505,13465931),
	(13465931,13465931),(13467253,13457965),(13467505,13457965),(13467253,13465931))

select * from from_to 
left join lateral  (select * from natalie.get_centreline_btwn_intersections(node_s, node_e))a ON true

The view returned in

-- natalie chonk view
16 secs 51 msec.

The mat view returned in

-- natalie chonk mat
6 secs 412 msec.

Simple spatial join

Mat view is just a little bit faster than a view.

-- 377 msec.
SELECT area_name, count(1) 
FROM natalie.centreline_latest_mat cent
inner join gis.neighbourhood nei on ST_intersects(cent.geom, nei.geom)
group by area_name
-- 457 msec.	
SELECT area_name, count(1) 
FROM natalie.centreline_latest cent
inner join gis.neighbourhood nei on ST_intersects(cent.geom, nei.geom)
group by area_name

Would expect a little more difference in performance when doing spatial join, especially with an index but apparently it doesn't super matter. However, since there is a significant improvement in efficiency when routing, I would suggest a mat view instead of a function or a view.

chmnata avatar Feb 15 '24 20:02 chmnata