bdit_data-sources
bdit_data-sources copied to clipboard
Missing Miovision Dates
- [ ] January 27, 2022
- [x] June 29, 2022
at least for the RapidTO locations
Backfilling for January 17, 2022 returned the below error:
[2022-08-11 09:44:03,121] {bash_operator.py:157} INFO - 11 Aug 2022 09:44:03 INFO Bayview Avenue and River Street 2022-01-27 06:00:00
[2022-08-11 09:44:03,679] {bash_operator.py:157} INFO - 11 Aug 2022 09:44:03 CRITICAL Traceback (most recent call last):
[2022-08-11 09:44:03,679] {bash_operator.py:157} INFO - File "/etc/airflow/data_scripts/volumes/miovision/api/intersection_tmc.py", line 233, in get_road_class
[2022-08-11 09:44:03,679] {bash_operator.py:157} INFO - return self.roaduser_class[ru_class]
[2022-08-11 09:44:03,679] {bash_operator.py:157} INFO - KeyError: 'Pedestrian'
[2022-08-11 09:44:03,679] {bash_operator.py:157} INFO -
[2022-08-11 09:44:03,679] {bash_operator.py:157} INFO - During handling of the above exception, another exception occurred:
[2022-08-11 09:44:03,679] {bash_operator.py:157} INFO -
[2022-08-11 09:44:03,679] {bash_operator.py:157} INFO - Traceback (most recent call last):
[2022-08-11 09:44:03,679] {bash_operator.py:157} INFO - File "/etc/airflow/data_scripts/volumes/miovision/api/intersection_tmc.py", line 99, in run_api
[2022-08-11 09:44:03,679] {bash_operator.py:157} INFO - pull_data(conn, start_time, end_time, intersection, path, pull, key, dupes)
[2022-08-11 09:44:03,679] {bash_operator.py:157} INFO - File "/etc/airflow/data_scripts/volumes/miovision/api/intersection_tmc.py", line 463, in pull_data
[2022-08-11 09:44:03,679] {bash_operator.py:157} INFO - c_start_t, c_end_t)
[2022-08-11 09:44:03,679] {bash_operator.py:157} INFO - File "/etc/airflow/data_scripts/volumes/miovision/api/intersection_tmc.py", line 301, in get_intersection
[2022-08-11 09:44:03,679] {bash_operator.py:157} INFO - table_veh = self.process_response('tmc', response_tmc)
[2022-08-11 09:44:03,679] {bash_operator.py:157} INFO - File "/etc/airflow/data_scripts/volumes/miovision/api/intersection_tmc.py", line 296, in process_response
[2022-08-11 09:44:03,679] {bash_operator.py:157} INFO - + self.process_tmc_row(row) for row in data]
[2022-08-11 09:44:03,679] {bash_operator.py:157} INFO - File "/etc/airflow/data_scripts/volumes/miovision/api/intersection_tmc.py", line 296, in <listcomp>
[2022-08-11 09:44:03,679] {bash_operator.py:157} INFO - + self.process_tmc_row(row) for row in data]
[2022-08-11 09:44:03,679] {bash_operator.py:157} INFO - File "/etc/airflow/data_scripts/volumes/miovision/api/intersection_tmc.py", line 274, in process_tmc_row
[2022-08-11 09:44:03,679] {bash_operator.py:157} INFO - classification = self.get_road_class(row)
[2022-08-11 09:44:03,679] {bash_operator.py:157} INFO - File "/etc/airflow/data_scripts/volumes/miovision/api/intersection_tmc.py", line 236, in get_road_class
[2022-08-11 09:44:03,680] {bash_operator.py:157} INFO - .format(row['class']))
[2022-08-11 09:44:03,680] {bash_operator.py:157} INFO - ValueError: vehicle class Pedestrian not recognized!
Considering we don't need that location for RapidTO... wondering if we should run this manually and exclude that intersection, or run it manually for allllll the other intersections.
I can walk someone through text processing to create the command line options for all intersections if someone is stumped
Backfilled for Jan 27 for all other intersection except uid 65, tried re-pulling for 65 after miovision rolling back the config and still got the same error
@tankedman can you create a new script on top of the original miovision pulling script so we run a range of date (start_date, end_date) that: - log the days that has 'pedestrian' error, - log the row that returned the error from the api - then skip the day that has error and continue to pull the next day
So that we can at least pull the days that we can pull :meow_dio:
The "Pedestrian" Error movements are now available in my schema, under jmok.miov_pederr
.
Additionally, the backfill script has now pulled all non-error movements from 2021-06-16 to 2022-05-31.
Thanks for putting together the table, @tankedman
Looking at the log file from your processes, there are 484 counts of warning! you are trying to insert duplicate data
throughout different days, starting aug 21 2021 till the last day pulled may 30 2022.
For example, the msg is saying that at 2021-08-21 11:59:00
, classification_id: 10, leg: S, movement_uid: 7, and volume: 1 is duplicated. Which in this case, it is indeed duplicated as the row inserted first is exactly the same, as with 483/484 of the duplicated rows showed in the log.
I've checked some recent airflow miovision pulling logs and did not see similar warnings.
16 Feb 2023 21:23:54 INFO Bayview Avenue and River Street 2021-08-21 06:00:00
16 Feb 2023 21:23:56 INFO Completed data pulling from 2021-08-21 06:00:00 to 2021-08-21 12:00:00
16 Feb 2023 21:23:57 WARNING WARNING: You are trying to insert duplicate data!
65, 2021-08-21 11:59:00, 10, S, 7, 1
However, there is ONE case where we got duplicated data for the same datetime, classification_id, movement_id, but with a different volume :meow_dio:
select *
from natalie.miovision_err_dups dups
inner join miovision_api.volumes using (intersection_uid, datetime_bin, classification_uid, leg, movement_uid)
where dups.volume != volumes.volume
The magical day of 2021-11-07 01:59:00
, inserted data shows the volume is 3, but the log says its ONE
65, 2021-11-07 01:59:00, 1, W, 2, 1
We should ask miovision about this on top of the peds error :meow_dead:
Peds error summary:
- 290 days out of 349 pulled between the date of 2021-06-16 to 2022-05-31 returned at least 1 error from pedestrian class, ranging from 18 rows to 232!
@radumas is this still fixable? Given https://github.com/CityofToronto/bdit_data-sources/issues/287#issuecomment-1677316909 it would seem like this is now historical data that is unlikely to be recoverable?
Different error
Added in anomalous_ranges uid: 897