Detecting-Malicious-URL-Using-Pyspark

Development Enviroment

Apache Spark 2.3.0
Jupyter Notebook

Datasets

Datasets used in this project is manually obtained from the following sources:

Phising URLS

Phishtank - https://www.phishtank.com/developer_info.php
Open Phis - https://openphish.com/

SPAM URLS

JWSPAMSPY - http://www.joewein.de/sw/blacklist.htm

Malware URLS

DNS-BH - http://www.malwaredomains.com/wordpress/?page_id=66
https://www.malwarepatrol.net/my-account/
http://www.malwaredomainlist.com/

Benign URLS

Majestic - https://majestic.com/reports/majestic-million

Another Usefull Source to collect Malicious URLs

https://zeltser.com/malicious-ip-blocklists/

The Dataset.csv used in this project is the combination of the above sources. A data pre-processing program is used to clean and filter the data. Thus, the dataset is already being labelled and ready to be used in the project.

Detecting-Malicious-URL-Machine-Learning
Detecting-Malicious-URL-Machine-Learning copied to clipboard

Metadata

Detecting-Malicious-URL-Using-Pyspark

Development Enviroment

Datasets

Phising URLS

SPAM URLS

Malware URLS

Benign URLS

Another Usefull Source to collect Malicious URLs

← Metadata

Owner

Metadata

Detecting-Malicious-URL-Machine-Learning Detecting-Malicious-URL-Machine-Learning copied to clipboard

Metadata

Detecting-Malicious-URL-Using-Pyspark

Development Enviroment

Datasets

Phising URLS

SPAM URLS

Malware URLS

Benign URLS

Another Usefull Source to collect Malicious URLs

← Metadata

Owner

Metadata

Detecting-Malicious-URL-Machine-Learning
Detecting-Malicious-URL-Machine-Learning copied to clipboard