To change the LSTM into using numbers and not the discretization
Created by Alya Gomaa via monday.com integration. 🎉
In the following, we present the performance evaluation of four different machine learning models trained on a dataset (”modules/rnn_cc_detection/datasets/dataset_more_labels.dat”). The models evaluated include Random Forest, Support Vector Machine (SVM), k-Nearest Neighbors (KNN), and Recurrent Neural Network (RNN). The dataset comprises features labelled with binary classes. #316 is proposed using one-hot encoding, but in the following, we used the StratoLetter mapping to integers.
Model Overview:
- Random Forest:
- Accuracy: 1
- Methodology: The Random Forest classifier achieved an accuracy of 100% and an F1 score of 1. It was trained using 100 decision trees.
- Support Vector Machine (SVM):
- Accuracy: 0.8461
- F1 Score: 0.9166
- Methodology: The SVM model, utilizing a radial basis function kernel, attained an accuracy of 84% and an F1 score of 0.91. The features were scaled using StandardScaler.
- k-Nearest Neighbors (KNN):
- Accuracy: 0.7692
- F1 Score: 0.8695
- Methodology: The KNN classifier with 5 neighbors achieved an accuracy of 76% and an F1 score of 0.86. The features were scaled using StandardScaler.
- Recurrent Neural Network (RNN):
- Accuracy: 0.8461
- Loss: 0.6770
- Methodology: The RNN model, a Bidirectional GRU with dropout layers, achieved an accuracy of 84% on the test dataset. It was trained for 10 epochs using RMSprop optimizer.
Discussion:
- The Random Forest model demonstrated the highest accuracy among the traditional machine learning models evaluated, achieving 100% accuracy.
- The dataset has 62 records. It is expected that by increasing the number of records, the model's accuracy will increase.
- All models were trained and tested on the same dataset split, ensuring fair comparison of their performance metrics.
Details are available at https://github.com/stratosphereips/StratosphereLinuxIPS/commit/7fbc2ce2943cd64c64b19865a588881c2f2fea6c