Deep-Packet icon indicating copy to clipboard operation
Deep-Packet copied to clipboard

About the missing data set categories

Open HERMIT-OuO opened this issue 2 years ago • 2 comments

Hi.

I am trying to use ISCXVPN2016 for data preprocessing and segmentation of training and test sets. But ISCXVPN2016 does not seem to have a torrent01 item.

So I downloaded your processed dataset, but when I checked the number, I found that your dataset (category classification) is distributed as follows.

    label  count                                                                
0       0  12731
1       7  12731
2       6  12731
3       5  12731
4       1  12731
5      10  12731
6       3  12731
7       8  12731
8      11  12731
9       2  12731
10      4  12731

    label    count
0       0    23990
1       7    25344
2       6     8480
3       5   958956
4       1    13582
5      10    53498
6       3    18473
7       8     3260
8      11   179758
9       2  1236595
10      4    14258

It looks like there are only 11 categories instead of 12. I would like to ask, is it a mistake on my part?

HERMIT-OuO avatar Jun 25 '22 12:06 HERMIT-OuO

Coincidentally, I downloaded the origin ISCXVPN2016 dataset from UNB CIC website and found that the P2P:Torrent data PCAP file is indeed missing in the dataset, but many papers emphasize the application 16 classification task and the service 12 classification task.

I don't know how to deal with this problem

JieJayCao avatar Jun 30 '22 03:06 JieJayCao

I have looked at many papers that use ISCXVPN2016 as a dataset and basically all use P2P files as a classification category. I don't know how the authors have implemented this.

:)

HERMIT-OuO avatar Jul 01 '22 02:07 HERMIT-OuO

I also found this problem. Have you solved this problem now?

Pau1code avatar Sep 23 '22 00:09 Pau1code

I also found this problem. Have you solved this problem now?

You can just remove this category to do experiments without much impact. Or just use the processed dataset provided by this repository author.

JieJayCao avatar Sep 25 '22 13:09 JieJayCao

They used to include torrent01.pcap in their dataset, but they had removed it. The full dataset was updated in 2021. I retrained the model.

munhouiani avatar Sep 28 '22 03:09 munhouiani