bdit_data-sources icon indicating copy to clipboard operation
bdit_data-sources copied to clipboard

determine_working_machine improperly handles mass data outages

Open cczhu opened this issue 4 years ago • 3 comments

On 2020-12-14 there was a partial outage of Miovision data. From 18:30-18:45 some stations went offline, and from 18:45-19:00 all stations were offline. This can be seen using

SELECT datetime_bin,
	   SUM(volume) total_vol
FROM miovision_api.volumes
WHERE datetime_bin BETWEEN '2020-12-14 18:30:00' AND '2020-12-14 19:15:00'
GROUP BY datetime_bin
ORDER BY 1

This was confusingly reported by the check_miovision Airflow process:

image

The contradictory notices come from determine_working_machine - the first is from tot, which checks if any stations have invalid gaps, while the second is from the contents of mis, which includes invalid gaps for any stations. When every station is down, however, determine_working_machine associates the corresponding timestamps with NULL intersection and volume UIDs, and inserts a row with intersection_uid = NULL into mis.

Ideally we would figure out a way to distinguish lines with intersection_uid = NULL after they're generated in this CTE. One simple thing would be to simply replace NULL with -1, 999 or 'ALL' (if we're willing to return a string rather than an int in mis). We'd also need to figure out the error message handling for tot, since it shouldn't report All cameras are working fine in those cases.

I have no idea why the script didn't report the partial outages from 18:30 - 18:45.

cczhu avatar Dec 15 '20 17:12 cczhu

Uh oh, the same thing happened today.

SELECT datetime_bin,
	   SUM(volume) total_vol
FROM miovision_api.volumes
WHERE datetime_bin BETWEEN '2020-12-15 18:30:00' AND '2020-12-15 19:15:00'
GROUP BY datetime_bin
ORDER BY 1

shows a partial gap in data from 18:30 - 18:45, followed by no data from 18:45 - 19:00. Had a look at a few stations using the Miovision API to check if this is a processing issue on our end, and data is also missing there. Eg. Bay & Bloor station is missing data from 2020-12-15 18:34 - 2020-12-15 18:59. Curiously, there is no evidence of missing data from yesterday, but I didn't check the API directly yesterday.

E-mailing Brent.

cczhu avatar Dec 16 '20 14:12 cczhu

According to Brett Rogerson (e-mail 2020-12-16) this is occurring because:

There was a common process that was running in the new Server search tool and for some reason it caused an error in how it posted the data for Trafficlink and the API.

Brett assures us this won't happen tomorrow. I'll let him know if it does.

cczhu avatar Dec 16 '20 19:12 cczhu

Encountered another one of these on 2021-05-05 - we see partial missing data (all light vehicles missing from some locations) starting from 2021-05-04 19:30, followed by a total lack of data from 19:45-20:00. Annoyingly pulling the data again does give us counts from 19:45-20:00, so I've contacted Brett to see if this is a glitch with the Miovision servers. If not, I'll have to autopsy the pulling procedure on our end.

Update: Brett confirms there was a server issue on their end. Miovision is investigating. In the meantime, I re-pulled following these notes.

cczhu avatar May 05 '21 13:05 cczhu