Real_Time_Social_Media_Mining
Real_Time_Social_Media_Mining copied to clipboard
DevOps pipeline for Real Time Social/Web Mining
DevOps pipeline for Real Time Social/Web Mining
Workflow
-
[x] Setting up Apache Maven for Java project - User Interface and MapReduce functions
-
[x] Setting up GitHub repository workflow
-
[x] Setting up GitHub Actions for automation
-
[x] Creating a web crawler in Python using Tweepy library to fetch data based on some parameter.
-
[x] Create a HDFS cluster for MapReduce functionality and program Hadoop MapReduce in Java
-
[x] Setup Hadoop Core and create Job Tracker and Task Trackers for the project
-
[x] Implement MapReduce in HDFS using Java to count the frequency of significant words in Data dictionary, in Twitter string
-
[x] Configure Apache Maven with MapReduce codes and install Apache Hadoop Jar dependency
-
[x] Configure MapReduce code in GitHub Actions for automation
-
[x] Automate the Big Data pipeline till MapReduce using GitHub Actions
-
[x] WAP in Java to implement MapReduce from JSON file extracted from crawler to find the frequency of significant words - Textual Analysis
-
[x] Data Classification - create a multi-class data dictionary for sentimental analysis - currently for words (in future, we might extend it for phrases and sentences for improved accuracy)
-
[x] Data Predicition - Using the KNN algorithm in Python to find the relation between tweets and their sentiments.
-
[x] Data Visualization - Using the Python matplotlib library to implement visualization.
Important Source files and dependencies
-
pom.xml - Setup Apache Maven
-
helloworld.java - Basic Java project setup
-
maven.yml - setup GitHub Actions
-
crawler.py - Web Crawler in Python to extract twitter data based on specific hashtags.
-
info.csv - data file created as an output for the crawler and to be sent to the HDFS core for processing
-
MapReduce functionalities in Java
-
Map Function
-
Reduce Function
-
Main Java Code
- Sentimental Analysis in Python
- Convolutional Neural Networks
- Decision Tree
- SVM
- Pre-Processing
- Random Forests
- Naive Bayes
- XGBoost
-
matplotlib.py - Data Visualization using matplotlib in python
-
Hadoop Setup
- Hadoop Core Setup
- HDFS Setup
- MapReduce in Task Tracker
- MapReduce in Job Tracker
How to Contribute
It is an open source project. Open for everyone.
Follow these contribution guidelines.
License
MIT License, copyrighted to Storms In Brewing (2019-2020)