bdit_data-sources
bdit_data-sources copied to clipboard
Miovision: discrepencies between volumes_15min_mvt and unacceptable_gaps
Here are 12 dates with records that should be nulls but are not. You were lucky to pick one of them at random! I will create a new issue for this investigation.
"datetime_bin" | "count" |
---|---|
"2024-02-05" | 46 |
"2024-02-01" | 43 |
"2024-01-20" | 3990 |
"2024-01-19" | 4032 |
"2024-01-18" | 4032 |
"2024-01-17" | 4032 |
"2024-01-16" | 4032 |
"2024-01-15" | 4032 |
"2024-01-14" | 4032 |
"2024-01-13" | 4032 |
"2024-01-02" | 44 |
"2023-01-22" | 42 |
SELECT datetime_bin::date, COUNT(*)
FROM miovision_api.volumes_15min_mvt AS v
JOIN miovision_api.unacceptable_gaps AS un USING (intersection_uid, datetime_bin)
WHERE v.volume IS NOT NULL
GROUP BY 1
ORDER BY 1 DESC
Originally posted by @gabrielwol in https://github.com/CityofToronto/bdit_data-sources/issues/863#issuecomment-1938939546
All these discrepencies were caused by 5 unacceptable_gap entries:
"dt" | "intersection_uid" | "gap_start" | "gap_end" |
---|---|---|---|
"2023-01-22" | 15 | "2023-01-22 23:57:00" | "2023-01-23 00:04:00" |
"2024-01-02" | 60 | "2024-01-02 10:21:00" | "2024-01-03 00:00:00" |
"2024-01-12" | 6 | "2024-01-13 00:00:00" | "2024-01-20 23:45:00" |
"2024-02-02" | 7 | "2024-02-01 23:45:00" | "2024-02-02 22:57:00" |
"2024-02-05" | 41 | "2024-02-05 23:52:00" | "2024-02-06 00:17:00" |
SELECT DISTINCT un.dt, un.intersection_uid, un.gap_start, un.gap_end
FROM miovision_api.volumes_15min_mvt AS v
JOIN miovision_api.unacceptable_gaps AS un USING (intersection_uid, datetime_bin)
WHERE v.volume IS NOT NULL
AND v.datetime_bin >= '2023-01-02' AND v.datetime_bin < '2024-02-06'
The first/fourth/fifth record above are because the find_gaps script identifies gaps that overlap midnight, but started the day before. I didn't anticipate that this would make the pipeline non-idempotent. I will remove this behaviour now at the minor expense of missing small gaps that start over midnight.
The other two issues I can't think of a cause for! I can re-aggregate these 2 ranges and we can keep our eyes out for similar issues in the future.