bdit_data-sources icon indicating copy to clipboard operation
bdit_data-sources copied to clipboard

Refactor weather pipelines

Open radumas opened this issue 2 years ago • 8 comments

we should have a weather folder in this repo and a corresponding weather schema to hold our weather tables

radumas avatar Jun 08 '22 15:06 radumas

Python package for pulling env canada data https://pypi.org/project/env-canada/#description

chmnata avatar Jun 09 '22 17:06 chmnata

Historical Daily Weather script here: https://github.com/Toronto-Big-Data-Innovation-Team/activeto/blob/jasonlee/weekend_closures/scripts/import_weather.py

Things to add/change:

  • [x] change destination table to a table in weather schema
  • [x] change table name to historical_daily
  • [x] change the script to run daily and not monthly
  • [ ] create a DAG that runs daily with separate tasks for 1) pulling the data, 2) inserting the data to our database, as well as slack error alert failure callback

chmnata avatar Jun 09 '22 19:06 chmnata

Currently the historical table has the following columns: weather_uid, climate_id, dt, temp_max, temp_min, temp_mean, total_precip_mm

@tankedman mentioned that there are condition (e.g. Partly Cloudy), and wind speed, so we are adding that into the table as well

Wondering if the weather_uid, and climate_id columns are necessary 🤔

Can you also add a unique constraint for dt on the table so we don't insert duplicated data? As well as adding a index on dt. Thanks!!

chmnata avatar Oct 13 '22 21:10 chmnata

Just fyi: Weather_uid and Climate_id are references to the Environment Canada database. Yes, will add new columns and impose UNIQUE on dt.

Created two tables in weather schema:

historical_daily: tracks weather on a daily basis, will be pulled at end of day by script prediction_daily: tracks weather prediction on a daily basis, based on the prediction from the previous day

tankedman avatar Oct 13 '22 21:10 tankedman

ahh I see, yea I think we can exclude the weather_uid and climate_id

chmnata avatar Oct 13 '22 21:10 chmnata

Added a weather_bot for the DAG, connection added on airflow

chmnata avatar Oct 27 '22 20:10 chmnata

  1. Unable to access historical weather classes ECHistorical and ECHistoricalRange in the env_canada python package, so currently only able to pull the current day's weather. Likely have to scrape Environment Canada manually.

  2. env_canada package only able to get next 5 days of forecast. will modify prediction_import.py to pull 5 days at a time, overwriting previous dates.

tankedman avatar Oct 28 '22 14:10 tankedman

As per discussion with @tahaislam @tankedman, modification needed on :meow_salute: :

Prediction script:

  • [x] change from pulling 1 day of data to pull for all 5 available days
  • [x] add a column for date_pull (date) in the prediction table
  • [x] insert 5 days of data with upsert script, overwriting data with the same date and updating the column date_pull

chmnata avatar Oct 28 '22 14:10 chmnata