StratosphereLinuxIPS icon indicating copy to clipboard operation
StratosphereLinuxIPS copied to clipboard

To change the LSTM into using numbers and not the discretization

Open AlyaGomaa opened this issue 2 years ago • 1 comments

Created by Alya Gomaa via monday.com integration. 🎉

AlyaGomaa avatar Feb 10 '23 13:02 AlyaGomaa

In the following, we present the performance evaluation of four different machine learning models trained on a dataset (”modules/rnn_cc_detection/datasets/dataset_more_labels.dat”). The models evaluated include Random Forest, Support Vector Machine (SVM), k-Nearest Neighbors (KNN), and Recurrent Neural Network (RNN). The dataset comprises features labelled with binary classes. #316 is proposed using one-hot encoding, but in the following, we used the StratoLetter mapping to integers.

Model Overview:

  1. Random Forest:
    • Accuracy: 1
    • Methodology: The Random Forest classifier achieved an accuracy of 100% and an F1 score of 1. It was trained using 100 decision trees.
  2. Support Vector Machine (SVM):
    • Accuracy: 0.8461
    • F1 Score: 0.9166
    • Methodology: The SVM model, utilizing a radial basis function kernel, attained an accuracy of 84% and an F1 score of 0.91. The features were scaled using StandardScaler.
  3. k-Nearest Neighbors (KNN):
    • Accuracy: 0.7692
    • F1 Score: 0.8695
    • Methodology: The KNN classifier with 5 neighbors achieved an accuracy of 76% and an F1 score of 0.86. The features were scaled using StandardScaler.
  4. Recurrent Neural Network (RNN):
    • Accuracy: 0.8461
    • Loss: 0.6770
    • Methodology: The RNN model, a Bidirectional GRU with dropout layers, achieved an accuracy of 84% on the test dataset. It was trained for 10 epochs using RMSprop optimizer.

Discussion:

  • The Random Forest model demonstrated the highest accuracy among the traditional machine learning models evaluated, achieving 100% accuracy.
  • The dataset has 62 records. It is expected that by increasing the number of records, the model's accuracy will increase.
  • All models were trained and tested on the same dataset split, ensuring fair comparison of their performance metrics.

Details are available at https://github.com/stratosphereips/StratosphereLinuxIPS/commit/7fbc2ce2943cd64c64b19865a588881c2f2fea6c

tahifahimi avatar Feb 28 '24 20:02 tahifahimi