bdit_data-sources icon indicating copy to clipboard operation
bdit_data-sources copied to clipboard

Btdag

Open webgisgeek opened this issue 4 years ago • 11 comments

What this pull request accomplishes:

  • review and merge changes

Issue(s) this solves:

  • documentation on creating new bt routes

What, in particular, needs to reviewed:

webgisgeek avatar Feb 09 '21 16:02 webgisgeek

@KatiRG @chmnata if you could review the route updating documentation here that would be great.

radumas avatar Feb 26 '21 03:02 radumas

@webgisgeek Hi Mohan, the readme is pretty clear, I edited a few minor things but there were some things that I didn't understand:

  • In this update, there exists an Excel Sheet template that contains details of newly added bluetooth detectors (update/img/template.PNG). The details include proposed route name, description, intersection name (BDIT convention), sensor id and lat/lon at start point, and sensor id and lat/lon at the end points along with numerous other fields. => there is no sensor id column in the Start point section

  • The routes that were updated by adding this batch of new detectors were named with a prefix "DT3"._ => screenshot does not show the new detectors DT3_

  • Depending on how many new readers there are, it may be worthwile to develop an automated process in PostgreSQL, but this update was done manually in the Excel template. In addition to the lat/lon, the streets where the readers are is also needed in 2 columns, and the segment name. => What are the 2 columns called? There is no "segment name" in the template.

  • The segment name is always the first two letters of each street, with the corridor street first and the intersecting street second. For example, a reader at Bloor/Christie measuring travel times on Bloor would be named BL_CH. For each proposed route, the excel sheet is populated with the start reader, end_reader and assigned a unique analysis_id. For this batch of new readers, analysis ids starting from 1600000 were added. => There are no columns start reader, end_reader and analysis_id in the template.

  • Therefore the reader table will have the following fields populated: analysis_id, street, direction, from_street, to_street, from_id, to_id, start_point_lat, start_point_lon, end_point_lat and end_point_lon. => these columns are not in the screenshot. Where are they supposed to be added? Also, there are many columns in the template that are not explained. e.g. !ozone, !zonelist, Maxtime, etc

  • After uploading excel data to PostgreSQL => to where? bluetooth.all_analyses ?

  • Should the query SELECT DISTINCT mohan.new_added_detectors.from_id::integer AS bluetooth_id have CREATE TABLE mohan.bluetooth_nodes AS at the beginning?

  • especially for oddly shaped intersections. => can you give an example of oddly-shaped intersections? What happens with them? How to fix them?

  • The table is now ready to be appended to the existing routes table. => can you show the query to append the table?

KatiRG avatar Mar 03 '21 16:03 KatiRG

check_brokenreaders.py should be renamed to bluetooth_check_readers.py so that the data source is the first word. I should update the documentation to specify a standard for how we name DAGs.

radumas avatar May 31 '21 13:05 radumas

Seems like the below two DAGs have been deleted and are no longer in the branch. I will rm them in the dag folder on the ec2 as well. bt_read_history.py -> /etc/airflow/dev_scripts/dags/bt_read_history.py bt_test_dag.py -> /etc/airflow/dev_scripts/dags/bt_test_dag.py

chmnata avatar Feb 16 '22 16:02 chmnata

For function reader_status_history, there is a join to a detectors_history_final table which was not documented in the readme. There are also 5 versions of it in the schema currently. The purpose of the join is to get the reader_id from detectors_history_final using read_name = detector_name. However, there are duplicated read_name in detectors_history_final as it contains all the past read_name for every reader_id. I don't think the historical read_name matters for this function as we are only running the route configurations that are currently in place. So I checked out another table called reader_locations and it seems to have the same read_name and detector_name relationship as the detectors_history_final table, but without the historic values. I think this would be a more suitable table to get the reader_id unless there is a look up table somewhere else...

chmnata avatar Mar 22 '22 18:03 chmnata

List of modified things: SQL

  • Functions:
    • broken readers: simplified, need to know which version of detectors_history_corrected should be used
    • insert report date: simplified
    • reader locations dt update: should be deleted
    • reader status history: simplified, need to know which version of detectors_history_final should be used
  • Table:
    • added length column to mohan.bt_segments_new
    • OWNER to bt_admins
    • Change create new segment to not ST_reverse geom

DAG: - updated to use sqlsensor - updated to send alert message for broken readers and not fail the task

README - added sql examples in readme - added steps to cut lines with linesubstring

Things that needs to be fixed:

  • [ ] readme is missing the explanation of detectors_history, currently in another md under update
  • [x] mohan.bt_segments_new is not ready to insert to bluetooth.segments, its missing a couple columns
  • [x] need to figure out which version of detectors_history_corrected to use then update function: broken reader and reader status history
  • [ ] sqls are only updated in github not in postgres

chmnata avatar Mar 31 '22 18:03 chmnata

Looking to move the detectors_history_* to bluetooth schema, mohan.detectors_history_corrected is the corrected version, however there are some null values on several columns. I will dig into how to populate those. e.g. some geoms are null and bt_id are null image

chmnata avatar Apr 06 '22 18:04 chmnata

Working on the new segments tables and adding the missing columns: segment_name , bluetooth, wifi, duplicate, reverse, end_street For bluetooth and wifi, according to https://github.com/CityofToronto/bdit_data-sources/blob/e98182d30de5511a2a4c246e4771f14c937c6b0f/bluetooth/sql/create_tables/segments.sql seems like bluetooth was assigned TRUE and wifi was assigned FALSE, but when I check the bluetooth.segments table there are values for wifi that are true 😕 Also not sure what reversed means, reversed geometry (?)

chmnata avatar Apr 06 '22 21:04 chmnata

Just in case future spelunkers come here. wifi = TRUE on highways where the problems associated with using WIFI observations are reduced. reversed was a flag to identify segments had been drawn against the direction of travel and which had been subsequently reversed. I'm really not sure why it stuck around. duplicate appears to have been to identify overlapping segments, which are exclusively exist on College. Since A LOT of those routes are offline, and some appear to have never been created, I am rather confused by the state of those detectors and why this was created this way in the first place. image

radumas avatar Apr 07 '22 22:04 radumas

In bluetooth.routes_temp, there are 61 routes out of 402 with no data in bluetooth.aggr_5min and bluetooth.observations.

image

But it does seems like some routes with data do have coverage on those routes with no data. (Green dotted lines are the ones without data and grey lines are the ones with data) image image

Ones that still have coverage (or partial coverage):

  • Adelaide from parliament to broadview 1495398, covered by 1454095
  • Parliament from gerrard to calton 1496472, covered by 1455692 (Parliament from dundas to calton)
  • College from ossington todufferin (1600090), sorta half covered by college from dovercourt to bathurst 1412888
  • Yonge between Chatsworth and Castlefield (1460798, 1461616), covered by various short and long routes (1428295, 1448887, 1448891, 1448413, 1428324, 1448861)
  • Sheppard from Morningside Ave to Nelson (end street was coded as sheppard incorrectly) 1460834, covered by VARIOUSSSSS short and long routes 5 in total

chmnata avatar Apr 27 '22 19:04 chmnata

Checking the distinct status in reader_history table, routepoint name does not sounds like a status type (?), added is also ???. I guess technically according to this issue https://github.com/CityofToronto/bdit_data-sources/issues/196, this table doesn't need a status column 🤔

  • "duplicate"
  • "routepoint name"
  • "online"
  • "offline"
  • "decommissioned"
  • "added"
  • "moved"

The following readme explained the meaning behind each status, kindddd of feels like the below 3 is confusing or just keeping a log of duplicated data/data of human error. https://github.com/CityofToronto/bdit_data-sources/blob/btdag/bluetooth/update/reader_status_details.md

  • added (these are the locations where names are inconsistent for example a reader with bt_id 4035 has a name E in one table and E4035 in another. Despite being at the same intersection at Gardiner & Parliament. Therefore, these two names could refer to the same reader. Thus to distinguish one got the status added and another is offline )

  • duplicate (duplicate does not have a bt_id but based on its location and/or the reader name, there are more than one at a given intersection. The one that has got less details is assigned as duplicate)

  • moved (moved has a bt_id but the same bt_id existed in a different location in the past. For example bt_id 5331 was at Gerrard & University. But the same bt_id also is at Lawrence & Mt Pleasant and is offline as per latest table. Thus 5331 has two entries with status offline and moved.

  • routepoint name (does not have a bt_id. It is a name given to a point along a route. Could there be a detector at those points ? most likely not)

chmnata avatar May 02 '22 18:05 chmnata