bdit_data-sources
bdit_data-sources copied to clipboard
HERE - Map / Data Mismatch
Arising from https://github.com/CityofToronto/bdit_data_requests/issues/39, did some digging around the newer HERE data and found some issues (not sure if this is something that is a known issue, or perhaps I'm just looking at the wrong tables @radumas @chmnata)
New data (here.ta_202001) and new streets layer (here_gis.streets_19_4)
Query:
WITH link_data AS (
SELECT link_dir, LEFT(link_dir,-1)::bigint AS link_id
FROM here.ta_202001
GROUP BY link_dir, LEFT(link_dir,-1)::bigint
),
max_links_by_fc AS (
SELECT func_class, SUM(CASE WHEN dir_travel = 'B' THEN 2 ELSE 1 END) AS num_links
FROM here_gis.streets_att_19_4
WHERE ar_auto = 'Y'
GROUP BY func_class
)
SELECT func_class, AVG(C.num_links) AS total_links, COUNT(1) AS total_links_w_data
FROM here_gis.streets_att_19_4 A
INNER JOIN link_data B USING (link_id)
INNER JOIN max_links_by_fc C USING (func_class)
GROUP BY func_class
ORDER BY func_class
Results:
| func_class | total_links | total_links_w_data |
|---|---|---|
| 1 | 609 | 603 |
| 2 | 1197 | 1177 |
| 3 | 3952 | 3404 |
| 4 | 30772 | 25453 |
| 5 | 245449 | 115899 |
Old data (here.ta_202001_old) and new streets layer (here_gis.streets_18_3)
Query:
WITH link_data AS (
SELECT link_dir, LEFT(link_dir,-1)::bigint AS link_id
FROM here.ta_202001_old
GROUP BY link_dir, LEFT(link_dir,-1)::bigint
),
max_links_by_fc AS (
SELECT func_class, SUM(CASE WHEN dir_travel = 'B' THEN 2 ELSE 1 END) AS num_links
FROM here_gis.streets_att_18_3
WHERE ar_auto = 'Y'
GROUP BY func_class
)
SELECT func_class, AVG(C.num_links) AS total_links, COUNT(1) AS total_links_w_data
FROM here_gis.streets_att_18_3 A
INNER JOIN link_data B USING (link_id)
INNER JOIN max_links_by_fc C USING (func_class)
GROUP BY func_class
ORDER BY func_class
Results:
| func_class | total_links | total_links_w_data |
|---|---|---|
| 1 | 605 | 599 |
| 2 | 1166 | 1148 |
| 3 | 3542 | 3534 |
| 4 | 26894 | 26809 |
| 5 | 127631 | 120732 |
I'm assuming I'm just looking at the wrong tables - will check back in tomorrow morning
WITH links AS (
SELECT link_dir
FROM here.ta_202001
GROUP BY link_dir
)
SELECT COUNT(1) AS total_links, SUM(CASE WHEN link_id IS NOT NULL THEN 1 ELSE 0 END) AS links_found_in_map
FROM links A
LEFT JOIN here_gis.streets_19_4 B ON LEFT(A.link_dir,-1)::bigint = B.link_id
new data: 158,208 links with data, only 146,536 on map
WITH links AS (
SELECT link_dir
FROM here.ta_202001_old
GROUP BY link_dir
)
SELECT COUNT(1) AS total_links, SUM(CASE WHEN link_id IS NOT NULL THEN 1 ELSE 0 END) AS links_found_in_map
FROM links A
LEFT JOIN here_gis.streets_18_3 B ON LEFT(A.link_dir,-1)::bigint = B.link_id
old data: 152,822 links with data, all on map
Keeping these two issues separate for a moment I would propose two (hopefully) quick analyses:
- Investigate the length and other characteristics of the links that don't have data. I suspect they might be really short, but I don't know why path processing wouldn't give them data. There have been odd quirks in the network leading to low data at certain locations (mostly thinking of Gardiner-Lakeshore).
- re-upload the streets dataset to find where unmapped
link_dirs are occurring, if this is a boundary issue or something else.
Or we could just prepare two lists of links and ask our vendor what's up.
I'll take a crack at 2. later today
Took a look at links with missing data for functional class 4 and above. Red means that at least one valid direction is missing data in January 2020. Functional class 5 links look much worse (not pictured here for clarity).

DROP TABLE aharpal.here_19_4_datacheck;
CREATE TABLE aharpal.here_19_4_datacheck AS
WITH links_data AS (
SELECT LEFT(link_dir,-1)::bigint AS link_id, COUNT(1) AS num_links_w_data
FROM here.ta_202001
GROUP BY LEFT(link_dir,-1)::bigint
)
SELECT ROW_NUMBER() OVER (ORDER BY link_id) AS row_num,
link_id,
A.geom,
B.func_class,
(CASE WHEN B.dir_travel = 'B' THEN 2 ELSE 1 END > COALESCE(C.num_links_w_data,0)) AS missing_data
FROM here_gis.streets_19_4 A
INNER JOIN here_gis.streets_att_19_4 B USING (link_id)
LEFT JOIN links_data C USING (link_id)
WHERE ar_auto = 'Y' AND func_class NOT IN ('5');
ALTER TABLE aharpal.here_19_4_datacheck ADD PRIMARY KEY (row_num);
For comparison, here_gis.streets_18_3 using old data:

We are encountering the same problem again with 21_1 map version. This time we are using the map version we received from TC. However:
- we are receiving data from TA for old links
- there are links in the new TC map for which we are receiving no TA data
- there are links for which we are receiving less TA data for the same month
Quick example:
-- Example using an updated link on Danforth
-- Old Link
-- returns link in danforth
select * from here_gis.streets_att_19_4_tc where link_id = 995444929
-- there is data
select * from here.ta_201901
where link_dir = '995444929F'
limit 10
-- New links in 21_1 that make up link_dir '995444929F'
--returns link in danforth (shows up in new map 21_1)
select * from here_gis.streets_att_21_1 where link_id in (1291285372, 1291285373)
-- returns null (does not show up in old map 19_4_tc)
select * from here_gis.streets_att_19_4_tc where link_id in (1291285372, 1291285373)
-- but there is no data for the new links 21_1
select * from here.ta
where link_dir = '1291285373F'
Next step: Update map_version_comparison notebook Important Note: We need to have a more systematic check on this particular issue every time we get a new map version.
Where are the links with no data located?
Most of the links without data for version 19_4_tc are located on the edge of the toronto boundary, and there are about 6 distinct segments located within the boundary of toronto. Most of links without data for version 21_1 also located on the edge of the toronto boundary, however there are significantly more segments without data located within the the boundary of toronto.
Version 19_4_tc - Links without data
Version 21_1 - Links without data

Those links at the Toronto boundary seem perpendicular to it. I'm inferring that the geometry is being included in what we call "Toronto" through the clipping process when they are in fact outside the jurisdiction for which we receive traffic data.