bdit_data-sources icon indicating copy to clipboard operation
bdit_data-sources copied to clipboard

Data sources used by the Big Data Innovation Team

BDIT Data Sources

This is the primary repository for code and documentation for most the data sources the Data & Analytics Unit uses.

Each folder is for a different data source (or category of related data sources). They contain:

  • an explanation of what the data source is,
  • how it can be used, and
  • the Python and SQL necessary for our Extract, Load, Transform, and Validate processes into our PostgreSQL database.

For those curious about what data we manage is released on OpenData, see the Open Data Releases.

Table of Contents

  • Airflow DAGS
  • Bluetooth Detectors
  • Collisions
  • Cycling App (inactive)
  • Events (inactive)
  • GIS - Geographic Data
    • Assets
      • Red Light Cameras
      • Traffic Signals
    • School Safety Zones
    • Street Centreline Geocoding
  • HERE Travel Time Data
  • Incidents (inactive)
  • INRIX (inactive)
  • Parking (inactive)
  • Road Closure (inactive)
  • TTC (inactive)
  • Volume Data
    • Miovision - Multi-modal Permanent Video Counters
    • RESCU - Loop Detectors (inactive)
    • Short-term Counting Program
    • VDS
  • Watch Your Speed signs
  • Weather
  • Open Data Releases

Airflow DAGS

dag/

This folder contains the DAG Python files for our Airflow orchestration that dictate the logic and schedule for data pipeline tasks.

Bluetooth Detectors

bluetooth/

The City collects traffic data from strategically placed sensors at intersections and along highways. These detect Bluetooth MAC addresses of vehicles as they drive by, which are immediately anonymized. When a MAC address is detected at two sensors, the travel time between the two sensors is calculated.

Collisions

collisions/

The collisions dataset consists of data on individuals involved in traffic collisions from approximately 1985 to the present day (though there are some historical collisions from even earlier included).

Cycling App (inactive)

cycling_app/

The Cycling App collected OD and trip data until 2016.

Events (inactive)

events/

How do special events impact traffic in the city? Data sources include the City's Open Data and TicketMaster.

GIS - Geographic Data

Assets

assets/

The assets directory stores airflow processes related to various assets that we help manage, such as datasets related to Vision Zero. Below are the assets that we have automated so far.

Red Light Cameras

assets/rlc/

Red Light Camera data are obtained from Open Data and are also indicators that are displayed on the Vision Zero Map and Dashboard. We have developed a process using Airflow to automatically connect to Open Data and store the data to our RDS Postgres database. See the README file in assets/rlc for details about this process.

Traffic Signals

assets/traffic_signals/

A number of different features of traffic signals (Leading Pedestrian Intervals, Audible Pedestrian Signals, Pedestrian Crossovers, Traffic Signals) are periodically pulled from OpenData . These indicators are used to populate the Vision Zero Map and Dashboard. See the README file in assets/traffic_signals for details about the source datasets and how they are combined into a final table made up of the following data elements.

School Safety Zones

gis/school_safety_zones/

This dataset comes from Vision Zero which uses Google Sheets to track progress on the implementation of safety improvements in school zones.

Street Centreline Geocoding

gis/text_to_centreline/

Contains SQL used to transform text description of street (in bylaws) into centreline geometries.

HERE Travel Time Data

here/

Travel time data provided by HERE Technologies from a mix of vehicle probes. Daily extracts of 5-min aggregated speed data for each link in the city (where data are available).

Incidents (inactive)

See CityofToronto/bdit_incidents

INRIX (inactive)

inrix/

Data collected from a variety of traffic probes from 2007 to 2016 for major streets and arterials.

Parking (inactive)

parking/

This contains R and SQL files for pulling parking lots and parking tickets from Open Data. They might be useful but haven't been documented or automated.

Road Closure (inactive)

road_closure/

This directory contains a Python file to pull and parse the XML feed of road closures. This process hasn't been automated (and more recent versions of the API use JSON).

TTC (inactive)

ttc/

This contains some valiant attempts at transforming CIS vehicle location data provided to us by the TTC on streetcar locations as well as an automated process for pulling in GTFS schedule data.

Volume Data

volumes/

Miovision - Multi-modal Permanent Video Counters

volumes/miovision/

Miovision currently provides volume counts gathered by cameras installed at specific intersections. There are 32 intersections in total. Miovision then processes the video footage and provides volume counts in aggregated 1 minute bins. Data stored in 1min bin (TMC) is available in miovision_api.volumes whereas data stored in 15min bin for TMC is available in miovision_api.volumes_15min_tmc and data stored in 15min for ATR is available in miovision_api.volumes_15min.

RESCU - Loop Detectors (inactive)

volumes/rescu/

Deprecated. See Vehicle Detector Station (VDS).

Short-term Counting Program

volumes/short_term_counting_program/

Short-term traffic counts are conducted on an ad-hoc basis as the need arises, and may be done throughout the year both at intersections and mid-block. Much of this dataset is also available through the internal application MOVE and data go as far back as 1994.

Vehicle Detector Station (VDS)

volumes/vds/

The city operates various permanent Vehicle Detector Stations (VDS), employing different technologies, including RESCU, intersection detectors, Blue City and Smartmicro. The most frequently used for D&A context is the RESCU network which tracks traffic volumes on Toronto expressways, about which more information can be found on the city's website or here.

Watch Your Speed signs

wys/

The city has installed Watch Your Speed signs that display the speed a vehicle is travelling at and flashes if the vehicle is travelling over the speed limit. Installation of the sign was done as part of 2 programs: the mobile watch your speed which has signs mounted on existing poles, moved every few weeks, and school watch your speed which has signs installed at high priority schools. The signs also collect continuous speed data.

Weather

weather/

Daily historical weather conditions and predictions from Environment Canada.

Open Data Releases

  • Travel Times - Bluetooth contains data for all the bluetooth segments collected by the city. The travel times are 5 minute average travel times. The real-time feed is currently not operational. See the Bluetooth README for more info.
  • Watch Your Speed Signs give feedback to drivers to encourage them to slow down, they also record speed of vehicles passing by the sign. Semi-aggregated and monthly summary data are available for the two programs (Stationary School Safety Zone signs and Mobile Signs) and are updated monthly. see the WYS README for links to these datasets

For the King St. Transit Pilot, the team has released the following datasets, which are typically a subset of larger datasets specific to the pilot: