ml-ids
ml-ids copied to clipboard
:mag: Machine-learning-based intrusion detection
Machine-learning based intrusion detection
Downloading the Datasets
Download the 1999 DARPA IDS Dataset, and the 1999 KDD Dataset by running
cd data
./download_data.sh
This takes about 30 minutes (depending on your internet connection) and downloads the inside TCPDUMP files from the dataset (~18GB) organized into training and test sets, as well as a sample of the KDD dataset.
1999 DARPA Evaluation Labels
A description of how evaluation is performed for the DARPA dataset, as well as ground truth files can be found on the DARPA Dataset Documentation page.
Experiment Files
Our various experiments are organized as Python files in the root of the repository. Each of the experiments is explained below.
gmm.py- Mixture of Gaussian experiment- This experiment uses a stationary mixture of Gaussian model by leveraging the sklearn library.
phad-c32.py- Our Python implementation of the PHAD-C32 algorithm described in the original paper
phad_feat_all_but_one.py- A feature ablation experiment for the PHAD algorithm which iteratively tests all features except for one.
phad_ttl_only.py- A simplified version of PHAD with only uses the
IPv4_ttlpacket field as a feature.
- A simplified version of PHAD with only uses the
kdd_knn.py- A K-nearest neighbor model and feature ablation experiment for the KDD dataset using the KDD features which iteratively tests a single feature at a time.
Checking Results
check_results.py is a simple script used for checking the results of each
experiment.
usage: check_results.py [-h] [--thresh THRESH] [--plot] [--table TABLE]
results_file attacks_file
positional arguments:
results_file the results.csv file
attacks_file the actual attacks file
optional arguments:
-h, --help show this help message and exit
--thresh THRESH range of thresholds to try. Format: start:stop:num_points,
default: 0.5:0.5:1
--plot make plots
--table TABLE make table using the specified threshold
Generating Plots
The plots we used in the poster and paper were generated using the scripts in
plotting/.
Running Tests
To run tests locally, run
python -m unittest discover
from the root folder of the repository.