bdit_data-sources
bdit_data-sources copied to clipboard
Miovision - Missing Data + Cameras Down
Three issues here:
- There are a number of intersections which need to be backfilled with data (outlined here: #416)
- There are intersections that are currently down.
- There are intersections that need to be added to
miovision.intersections
1. Intersections requiring backfill
List of intersections that should likely be re-pulled via API based on difference between install date and first available date of data:
WITH earliest_date AS (
SELECT intersection_uid, MIN(datetime_bin)::date AS dt
FROM miovision_api.volumes_15min
GROUP BY intersection_uid
)
SELECT intersection_uid, intersection_name, dt AS date_available, date_installed
FROM miovision_api.intersections
INNER JOIN earliest_date USING (intersection_uid)
ORDER BY intersection_uid
intersection_uid | intersection_name | date_available | date_installed |
---|---|---|---|
58 | Bloor Street West and Dufferin Street | 06/21/2021 | 12/22/2020 |
59 | Eglinton Avenue West and Jane Street | 06/21/2021 | 06/02/2021 |
60 | Harbord Street/St. George Street/Hoskin Street | 06/21/2021 | 05/14/2021 |
61 | Jane Street and Lawrence Avenue West | 06/21/2021 | 06/08/2021 |
62 | Jane Street and Wilson Avenue | 06/21/2021 | 06/09/2021 |
63 | Sheppard Avenue West and Weston Road | 06/21/2021 | 05/14/2021 |
64 | Yonge Street / Davenport Road / Church St | 06/21/2021 | 05/14/2021 |
2. Intersections currently down (based on Trafficlink and data in RDS):
- Jane and Steeles: down as of August 6, 2021 (unsure when installed)
- Jane and Eglinton: down as of July 2, 2021 (installed on June 2, 2021)
- Wellington and Bay: down as of April 3, 2021
3. Intersections not in database
- Jane and Steeles
- Bayview and River
When running the intersection_tmc script to populate volumes for intersections 67 (Sheppard & Keele) and 68 (Steeles & Jane), the volumes table is not populated and the script outputs "INFO Sheppard Avenue West and Keele Street not active on 2021-06-16 06:00:00" For all times within the selected period (one day from their install date). Likely the cameras were not active. @radumas @chmnata
@tankedman Archived documentation is out of date, please run the aggregation part with the following functions in order:
miovision_api.find_gaps
miovision_api.aggregate_15_min_mvt
miovision_api.aggregate_15_min
miovision_api.report_dates
Documenting my confusion here so we can discuss on Monday:
Jedwin's nohup.out shows that miovision_api.aggregate_15_min_mvt
is trying insert a duplicated value
ERROR: duplicate key value violates unique constraint "volumes_15min_mvt_intersection_uid_datetime_bin_classificat_key"
DETAIL: Key (intersection_uid, datetime_bin, classification_uid, leg, movement_uid)=(1, 2022-07-20 23:00:00, 1, E, 1) already exists.
But when I check the volumes table, there are no rows with volume_15min_mvt_uid is null
in the table, which means this shouldn't be aggregated at all since there is a where clause in the function that only runs volumes where volume_15min_mvt_uid is null
is null.
select * from miovision_api.volumes
where intersection_uid = 1 and datetime_bin::Date = '2022-07-20' and volume_15min_mvt_uid is null
There is no parameter to specific intersections to aggregate, meaning the function should know not to aggregate and insert data that was already aggregated (aka having a volume_15min_mvt_uid)
So there is indeed a bug in the aggregation function. Because of the 0-padding of null-volumes it will try to insert a 0 where there might already be 0 data.
https://github.com/CityofToronto/bdit_data-sources/blob/master/volumes/miovision/sql/function-aggregate-volumes_15min_mvt.sql#L47-L49
That is because the aggregation query starts with the cross product of intersection X time-bins (from generate_series
) for the date(s) to aggregate. It then left joins on existing volumes excluding A.volume_15min_mvt_uid IS NULL
, but there are two conditions in which that field may be NULL
:
- the row exists and the column is null
- the row doesn't exist
There's a subsequent HAVING
clause that should exclude 0-volumes from being inserted... except for the classifications we deem to be important.
I can think of two solutions:
- add an anti-join to this insert query to exclude intersection X time-bins that have already been populated (and I guess stop relying on this
volume_15min_mvt_uid
foreign key relationship - Insert 0s into
volumes
where they've been zero-padded.
Deleting volumes for specific intersections to re-run the CLI won't work because this function still zero-pad the other intersections on that date
As discussed with @radumas, we can add a filter for the intersection_uid we are aggregating for in the function's sql and run that first . @tankedman wanna try creating the query?
Agg. process complete for intersections 65 and 66 from 07-21-2022 to 08-15-2022. Next step is to backfill to start date 06-16-2021