data-infra
data-infra copied to clipboard
NTD Ingestion – Remaining Tables in Dataset
User story / feature request
Part of #3401
As a data engineer, I would like to ingest the data in the form of gcs blob storage and external tables for the remaining tables in the NTD dataset, extending the work found in scrape_ntd.py
and annual_database_service.yml
(found below).
Building upon the work completed in #3345
Existing NTD patterns:
- script/scrape_ntd.py
- airflow/dags/create_external_tables/ntd_data_products/annual_database_service.yml
General Cal-ITP Pipeline Patterns
Acceptance Criteria
I can successfully access the data for the following tables from the https://www.transit.dot.gov/ntd/data-product website in the warehouse as external tables:
- Assets
- Expenses
- Fares/Funding
- Resources
- Security and Safety
- Service (previous years)