bdit_data-sources icon indicating copy to clipboard operation
bdit_data-sources copied to clipboard

HERE - Map / Data Mismatch

Open aharpalaniTO opened this issue 5 years ago • 8 comments

Arising from https://github.com/CityofToronto/bdit_data_requests/issues/39, did some digging around the newer HERE data and found some issues (not sure if this is something that is a known issue, or perhaps I'm just looking at the wrong tables @radumas @chmnata)

New data (here.ta_202001) and new streets layer (here_gis.streets_19_4)

Query:

WITH link_data AS (	
	SELECT link_dir, LEFT(link_dir,-1)::bigint AS link_id
	FROM here.ta_202001
	GROUP BY link_dir, LEFT(link_dir,-1)::bigint
),	
	
max_links_by_fc AS (	
	SELECT func_class, SUM(CASE WHEN dir_travel = 'B' THEN 2 ELSE 1 END) AS num_links
	FROM here_gis.streets_att_19_4
	WHERE ar_auto = 'Y'
	GROUP BY func_class
)	
	
SELECT func_class, AVG(C.num_links) AS total_links, COUNT(1) AS total_links_w_data	
FROM here_gis.streets_att_19_4 A	
INNER JOIN link_data B USING (link_id)	
INNER JOIN max_links_by_fc C USING (func_class)	
GROUP BY func_class	
ORDER BY func_class	

Results:

func_class total_links total_links_w_data
1 609 603
2 1197 1177
3 3952 3404
4 30772 25453
5 245449 115899

Old data (here.ta_202001_old) and new streets layer (here_gis.streets_18_3)

Query:

WITH link_data AS (
	SELECT link_dir, LEFT(link_dir,-1)::bigint AS link_id
	FROM here.ta_202001_old
	GROUP BY link_dir, LEFT(link_dir,-1)::bigint
),

max_links_by_fc AS (
	SELECT func_class, SUM(CASE WHEN dir_travel = 'B' THEN 2 ELSE 1 END) AS num_links
	FROM here_gis.streets_att_18_3
	WHERE ar_auto = 'Y'
	GROUP BY func_class
)

SELECT func_class, AVG(C.num_links) AS total_links, COUNT(1) AS total_links_w_data
FROM here_gis.streets_att_18_3 A
INNER JOIN link_data B USING (link_id)
INNER JOIN max_links_by_fc C USING (func_class)
GROUP BY func_class
ORDER BY func_class

Results:

func_class total_links total_links_w_data
1 605 599
2 1166 1148
3 3542 3534
4 26894 26809
5 127631 120732

aharpalaniTO avatar Jun 29 '20 03:06 aharpalaniTO

I'm assuming I'm just looking at the wrong tables - will check back in tomorrow morning

WITH links AS (
SELECT link_dir
FROM here.ta_202001
GROUP BY link_dir
	)

SELECT COUNT(1) AS total_links, SUM(CASE WHEN link_id IS NOT NULL THEN 1 ELSE 0 END) AS links_found_in_map
FROM links A
LEFT JOIN here_gis.streets_19_4 B ON LEFT(A.link_dir,-1)::bigint = B.link_id

new data: 158,208 links with data, only 146,536 on map

WITH links AS (
SELECT link_dir
FROM here.ta_202001_old
GROUP BY link_dir
	)

SELECT COUNT(1) AS total_links, SUM(CASE WHEN link_id IS NOT NULL THEN 1 ELSE 0 END) AS links_found_in_map
FROM links A
LEFT JOIN here_gis.streets_18_3 B ON LEFT(A.link_dir,-1)::bigint = B.link_id

old data: 152,822 links with data, all on map

aharpalaniTO avatar Jun 29 '20 03:06 aharpalaniTO

Keeping these two issues separate for a moment I would propose two (hopefully) quick analyses:

  1. Investigate the length and other characteristics of the links that don't have data. I suspect they might be really short, but I don't know why path processing wouldn't give them data. There have been odd quirks in the network leading to low data at certain locations (mostly thinking of Gardiner-Lakeshore).
  2. re-upload the streets dataset to find where unmapped link_dirs are occurring, if this is a boundary issue or something else.

Or we could just prepare two lists of links and ask our vendor what's up.

radumas avatar Jun 29 '20 17:06 radumas

I'll take a crack at 2. later today

aharpalaniTO avatar Jun 29 '20 19:06 aharpalaniTO

Took a look at links with missing data for functional class 4 and above. Red means that at least one valid direction is missing data in January 2020. Functional class 5 links look much worse (not pictured here for clarity).

image

DROP TABLE aharpal.here_19_4_datacheck;
CREATE TABLE aharpal.here_19_4_datacheck AS

WITH links_data AS (	
	SELECT 	LEFT(link_dir,-1)::bigint AS link_id, COUNT(1) AS num_links_w_data
	FROM 	here.ta_202001
	GROUP BY LEFT(link_dir,-1)::bigint
)
	
SELECT		ROW_NUMBER() OVER (ORDER BY link_id) AS row_num, 
			link_id, 
			A.geom, 
			B.func_class, 
			(CASE WHEN B.dir_travel = 'B' THEN 2 ELSE 1 END > COALESCE(C.num_links_w_data,0)) AS missing_data

FROM		here_gis.streets_19_4 A 
INNER JOIN	here_gis.streets_att_19_4 B USING (link_id)
LEFT JOIN 	links_data C USING (link_id)
WHERE 		ar_auto = 'Y' AND func_class NOT IN ('5');

ALTER TABLE  aharpal.here_19_4_datacheck ADD PRIMARY KEY (row_num);

aharpalaniTO avatar Jun 30 '20 02:06 aharpalaniTO

For comparison, here_gis.streets_18_3 using old data:

image

aharpalaniTO avatar Jun 30 '20 02:06 aharpalaniTO

We are encountering the same problem again with 21_1 map version. This time we are using the map version we received from TC. However:

  • we are receiving data from TA for old links
  • there are links in the new TC map for which we are receiving no TA data
  • there are links for which we are receiving less TA data for the same month

Quick example:

-- Example using an updated link on Danforth

-- Old Link
-- returns link in danforth
select * from here_gis.streets_att_19_4_tc where link_id = 995444929
-- there is data
select * from here.ta_201901
where link_dir = '995444929F'
limit 10

-- New links in 21_1 that make up link_dir '995444929F'
--returns link in danforth (shows up in new map 21_1)
select * from here_gis.streets_att_21_1 where link_id in (1291285372, 1291285373)
-- returns null (does not show up in old map 19_4_tc)
select * from here_gis.streets_att_19_4_tc where link_id in (1291285372, 1291285373)
-- but there is no data for the new links 21_1
select * from here.ta
where link_dir = '1291285373F'

Next step: Update map_version_comparison notebook Important Note: We need to have a more systematic check on this particular issue every time we get a new map version.

chmnata avatar Jun 09 '21 16:06 chmnata

Where are the links with no data located?

Most of the links without data for version 19_4_tc are located on the edge of the toronto boundary, and there are about 6 distinct segments located within the boundary of toronto. Most of links without data for version 21_1 also located on the edge of the toronto boundary, however there are significantly more segments without data located within the the boundary of toronto.

Version 19_4_tc - Links without data image Version 21_1 - Links without data image

chmnata avatar Jun 09 '21 18:06 chmnata

Those links at the Toronto boundary seem perpendicular to it. I'm inferring that the geometry is being included in what we call "Toronto" through the clipping process when they are in fact outside the jurisdiction for which we receive traffic data.

radumas avatar Jun 11 '21 02:06 radumas