ASCAD icon indicating copy to clipboard operation
ASCAD copied to clipboard

I am puzzled by the results presented in your paper.

Open GuanlinLee opened this issue 6 years ago • 12 comments

In your paper, you mentioned that the mean rank of the model is close to 0, but the average correct rate of the model is only about 0.004. But I think that a mean rank close to 0 should mean that the correct rate is close to 100%. Why is this?

GuanlinLee avatar Aug 09 '18 00:08 GuanlinLee

The mean accuracy (aka what you refer to "average correct rate") measures the capacity of the model to accurately classify a SINGLE trace/observation. With our models we get between 0.004 and 0.006 (while a random classification among 256 classes should be close to 0.004). Indeed, this accuracy is not very high but as discussed in the paper, it is not equivalent to our target metric which measures the capacity of the model to correctly classify the key hypothesis from SEVERAL traces/observations. The latter is measured with the mean rank which combines the scores (roughly speaking the outputs of the model for several traces) based on Eqn (2) and (3) in the paper.

prouff avatar Aug 22 '18 09:08 prouff

@prouff I used your code and did an 8:2 split on the training set during training to use as a validation set. But I found that your model I used to train with your parameters is not the same as the one shown in your paper, and there is no fitting phenomenon that you said. Instead, performance degradation has already occurred on the verification set in the first few rounds of training. Please ignore the model names on my images, they only correspond to the full connection and convolution network. performance of best_model_desync100_cnn_net h5 against ascad_desync100 h5 performance of best_model_desync50_cnn_net h5 against ascad_desync50 h5 performance of best_model_desync0_cnn_net h5 against ascad h5 performance of my_mlp_best_desync100_epochs200_batchsize100 h5 against ascad_desync100 h5 performance of my_mlp_best_desync50_epochs200_batchsize100 h5 against ascad_desync50 h5 performance of my_mlp_best_desync0_epochs200_batchsize100 h5 against ascad h5

GuanlinLee avatar Aug 24 '18 05:08 GuanlinLee

Hi,

To precisely answer to your questions, we need more information on the tests that you have performed: which model, with which training/testing parameters against traces with which modification (eg desynch). Could you please send us those info?

Regards,

Emmanuel

Le ven. 24 août 2018 à 07:26, lgl [email protected] a écrit :

@prouff https://github.com/prouff I used your code and did an 8:2 split on the training set during training to use as a validation set. But I found that your model I used to train with your parameters is not the same as the one shown in your paper, and there is no fitting phenomenon that you said. Instead, performance degradation has already occurred on the verification set in the first few rounds of training. Please ignore the model names on my images, they only correspond to the full connection and convolution network. [image: performance of best_model_desync100_cnn_net h5 against ascad_desync100 h5] https://user-images.githubusercontent.com/23256131/44566645-2eb53180-a7a1-11e8-9286-0ccb0383e5df.png [image: performance of best_model_desync50_cnn_net h5 against ascad_desync50 h5] https://user-images.githubusercontent.com/23256131/44566646-2eb53180-a7a1-11e8-9e3b-3210d192c084.png [image: performance of best_model_desync0_cnn_net h5 against ascad h5] https://user-images.githubusercontent.com/23256131/44566647-2fe65e80-a7a1-11e8-9c04-a8dcbd7df976.png [image: performance of my_mlp_best_desync100_epochs200_batchsize100 h5 against ascad_desync100 h5] https://user-images.githubusercontent.com/23256131/44566648-307ef500-a7a1-11e8-8328-4990020a8d76.png [image: performance of my_mlp_best_desync50_epochs200_batchsize100 h5 against ascad_desync50 h5] https://user-images.githubusercontent.com/23256131/44566649-31178b80-a7a1-11e8-9073-4a083cc967b6.png [image: performance of my_mlp_best_desync0_epochs200_batchsize100 h5 against ascad h5] https://user-images.githubusercontent.com/23256131/44566650-31178b80-a7a1-11e8-9392-f148a5595641.png

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ANSSI-FR/ASCAD/issues/3#issuecomment-415654941, or mute the thread https://github.com/notifications/unsubscribe-auth/ABhOWDkn-MkLlleqKELbGssU-4cM6EfXks5uT46YgaJpZM4V03BT .

-- Emmanuel Prouff 9 rue Saint Clément 93200 Saint Denis France 0614107635

prouff avatar Aug 24 '18 15:08 prouff

I have not made any changes to your test scripts. As far as the training process of the model is concerned, the patchsize of the convolutional network has been modified by me to 64, and the fully connected network has not been modified. I have not made any changes to the learning rate and optimizer. I just want to prove that what you said is not true whether the fit is true. So I split the training set by 8 : 2. And save the model with the best val acc.

GuanlinLee avatar Aug 24 '18 15:08 GuanlinLee

ASCAD.zip

GuanlinLee avatar Aug 24 '18 15:08 GuanlinLee

Thanks for sharing all the information. For me, your results are quite conform to those reported in the paper (to get better information you should maybe average the results wrt several tests, eg perform cross-validation): the rank for the correct key is converving towards 0 when there is no desynch, and does not show significant convergence when desynch equals 50 or 100. On our side, we got a good convergence with 100 epochs and around 400 traces. On your side, it seems that you need around 2000 traces to get the same kind of rank. To understand this difference, it would be good to have a cross-validation (maybe your training isntance was not that good).

prouff avatar Aug 25 '18 13:08 prouff

Sorry, I can't agree with you. I used the same partitioning and the same training script to train my model and used the same test script to test the model's performance on the test set. Similarly, I only saved model parameters that were not over-fitting on the validation set. I can show a few of my different models' performance on the test set. Because I am studying deep learning, I may have a different understanding of overfitting than you. I hope I can continue to communicate with you in depth. performance of best_model_desync50_inception_net h5 against ascad_desync50 h5 performance of best_model_desync50_inception_resnet_net h5 against ascad_desync50 h5

GuanlinLee avatar Aug 25 '18 13:08 GuanlinLee

I am just saying that the lack of overfitting mentioned in the paper is somewhat inconsistent with the results I observed in the experiment. I hope you can test the loss of the model on the verification set during the training. thx

GuanlinLee avatar Aug 25 '18 13:08 GuanlinLee

For the mention of "lack of overfitting" do you refer to Sect. 3.3.4?

prouff avatar Aug 25 '18 13:08 prouff

yes,of course

GuanlinLee avatar Aug 25 '18 13:08 GuanlinLee

Ok. This section refers to an attack against the MASKED sbox output (meaning that we assume the knowledge of the output mask rout to make the labelling). It is not exactly what you are attacking with the scprits (where the rout is not assumed to be masked).

prouff avatar Aug 25 '18 14:08 prouff

Thx,I got it.However, I still can't understand why the model will have better results in the test set if the validation set has been degraded. This is very different from the work I have done on computer vision. And I can't explain this phenomenon very well. I want to know that the keys in the training set and the test set are completely different?

GuanlinLee avatar Aug 25 '18 14:08 GuanlinLee