Optimizing-Public-Transportation
Optimizing-Public-Transportation copied to clipboard
A real-time event pipeline around Kafka Ecosystem for Chicago Transit Authority.
Optimizing-Public-Transportation
Architecture
Overview
In this project we construct a streaming event pipeline around Apache Kafka and its ecosystem. Using public dataset from Chicago Transit Authority we constructed an event pipeline around Kakfa that allows to simulate and display status of train in real time.
Arrival and Turnstiles -> Producers that create train arrival and turnstile information into our kafka cluster. Arrivals indicate that a train has arrived at a particular station while the turnstile event indicate a passanger has entered the station.
Weather -> A REST Proxy prodcer that periodically emits weather data by a REST Proxy and emits that to the kafka cluster.
Postgres SQL and Kafka Connect -> Extract data from stations and push it to kafka cluster.
Kafka status server -> Consumes data from kafka topics and display on the UI.
Environment
- Docker (I used bitnami kafka image available here
- Python 3.7
Running and Testing
First make sure all the service are up and running: For docker use:
docker-compose up
Docker-Compose will take 3-5 minutes to start, depending on your hardware. Once Docker-Compose is ready, make sure the services are running by connecting to them using DOCKER URL provided below:
Also, you need to install requirements as well, use below command to create a virtual environment and install requirements:
-
cd producers
-
virtualenv venv
-
. venv/bin/activate
-
pip install -r requirements.txt
Same for the consumers, setup environment as below:
-
cd consumers
-
virtualenv venv
-
. venv/bin/activate
-
pip install -r requirements.txt
Running Simulation
Run the producers using simulation.py in producers folder:
python simulation.py
Run the Faust Stream Processing Application:
cd consumers
faust -A faust_stream worker -l info
Run KSQL consumer as below:
cd consumers
python ksql.py
To run consumer server:
cd consumers
python server.py
Resources
Confluent Python Client Documentation
Confluent Python Client Usage and Examples
REST Proxy API Reference
Kafka Connect JDBC Source Connector Configuration Options