T-Watch
T-Watch copied to clipboard
Real Time Twitter Sentiment Analysis Product
Real Time Twitter Stream Analysis via Kafka and Spark Streaming
Motivation:
Build a data product that could process streaming data and has an end-to-end data pipeline that could be easily scaled upon request.
Model Training:
- Training tfidf and random forest model using pipeline on spark ML
- Saving models to S3
Real Time Analysis:
- Collecting real time twitter streams through Kafka
- Integrating Kafka with spark streaming
- Loading saved model to predict incoming streams in spark streaming
- Storing incoming streams to MongoDB in spark streaming
- Fetching data from MongoDB and publishing results on web application via flask