Detecting-Malicious-URL-Machine-Learning
Detecting-Malicious-URL-Machine-Learning copied to clipboard
Detecting-Malicious-URL-Using-Pyspark
Development Enviroment
- Apache Spark 2.3.0
- Jupyter Notebook
Datasets
Datasets used in this project is manually obtained from the following sources:
Phising URLS
- Phishtank - https://www.phishtank.com/developer_info.php
- Open Phis - https://openphish.com/
SPAM URLS
- JWSPAMSPY - http://www.joewein.de/sw/blacklist.htm
Malware URLS
- DNS-BH - http://www.malwaredomains.com/wordpress/?page_id=66
- https://www.malwarepatrol.net/my-account/
- http://www.malwaredomainlist.com/
Benign URLS
- Majestic - https://majestic.com/reports/majestic-million
Another Usefull Source to collect Malicious URLs
- https://zeltser.com/malicious-ip-blocklists/
The Dataset.csv used in this project is the combination of the above sources. A data pre-processing program is used to clean and filter the data. Thus, the dataset is already being labelled and ready to be used in the project.