ML-based-network-traffic-classifier
ML-based-network-traffic-classifier copied to clipboard
Network traffic classifier based on machine learning algorithms
Network traffic classifier based on statistical properties of application flows
UPDATE 18/03/2019: Refactored in OOP-style, more flexibility and features!
UPDATE 23/05/2020: Replaced custom flow-parsing mechanism with NFStream
UPDATE 17/09/2020: Added pytorch classifiers, including transformer-based one
UPDATE 30/10/2020: ANN classifiers (NGT, LSH), FS-NET baseline
Key features
-
Configurable feature extraction from network flows via
NFStream
. -
Possibility to test arbitrary sklearn algorithms (e.g. SVM, Random Forest, etc.) and configure their parameter search space via
.yaml
configs. -
Basic examples of pytorch classifiers and new generative transformer framework that can be used for building traffic generators and classifiers.
-
Option for experiment tracking with Neptune.
Project structure
-
flow_parsing
contains scripts for parsing flow features and labels from.pcap
into.csv
viaNFStream
. It can be used for exporting raw per-flow packet-features (e.g. packet/payload sizes, timestamps, various packet-fields) in a numpy array, as well as derivative statistics, such as feature percentiles, etc. -
evaluation_utils
contains utilities for evaluation of traffic classifiers and generators. -
fs_net
is a reimplementation of FS-NET classifier -
sklearn_classifiers
contains wrapper for sklearn-like classifiers and example pipeline script. Used models and their parameters are specified via the.yaml
configuration file. Check and modifyutils.py:REGISTERED_CLASSES
to support the needed models. -
nn_classifers
includes base class for pytorch-lightning classifier and some basic derivatives. -
gpt_model
has all the code required for building your own transformer-based traffic generator and classifier, along with a link to model checkpoints. See the package for more info.
Usage example for sklearn-based classifiers
-
A feature file has to be prepared before running model training, so make sure to create a
.csv
dataset by running, for example:PYTHONPATH=. python flow_parsing/pcap_parser.py -p flow_parsing/static/example.pcap --online_mode
-
OPTIONAL. Postprocess parsed
.csv
as needed, e.g. split into train-test, reassign target columns. -
Create own version of
config.yaml
to experiment with and test classifiers:PYTHONPATH=. python sklearn_classifiers/run_training.py --train_dataset csv_files/example_20packets.csv --target_column ndpi_category --continuous
Publications
If you find the code or datasets useful for your research, please, cite one of the following papers:
@article{Bikmukhamedov2021MultiClassNT,
title={Multi-Class Network Traffic Generators and Classifiers Based on Neural Networks},
author={Bikmukhamedov, Radion and Nadeev, Adel},
journal={2021 Systems of Signals Generating and Processing in the Field of on Board Communications},
year={2021},
pages={1-7},
url = {https://doi.org/10.1109/IEEECONF51389.2021.9416067}
}
@CONFERENCE{bikmukhamedov2019,
author = {Bikmukhamedov, R. F. and Nadeev, A. F.},
title = {Lightweight Machine Learning Classifiers of IoT Traffic Flows},
booktitle = {2019 Systems of Signal Synchronization, Generating and Processing in Telecommunications},
year = {2019},
}