Efficient-Computing
Efficient-Computing copied to clipboard
[RPG] Regarding the accuracy measurement
https://github.com/huawei-noah/Efficient-Computing/issues/139#issuecomment-2165075615
I created a new issue because I can't reopen the existing issue.
Hello, As mentioned in the link above, I successfully reimplemented CIFAR-10 using the best accuracy. I have checked that the best accuracy matches the accuracy in the paper's table. However, isn't the best accuracy different from the target sparsity? If it is correct to use the best accuracy, could you explain why the best accuracy was used instead of the last accuracy?
Thank you very much for raising this question. Let me check... My basic intuition doing this is that I found ImageNet accuracy would fluctuate by around 0.1% at the last several epochs. Thus, I wrote a code that simply read the value to the 'best_prec1' key for the model state dict as the result and I followed this convention for CIFAR-10. This worked perfectly well for ImageNet, but it did not occur to me that the practice could cause problems on CIFAR.
I will re-implement these experiments and try to figure the problem out.
I checked the experiments as follows: CIFAR-10 VGG19 experiments are correct at all sparsities; CIFAR-10 ResNet32 experiments are problematic at sparsity 99.5% and 99.9%; ImageNet ResNet-50 experiments are correct at all sparsities.
The following modifications will be made in the following couple of days:
- Update the code to fix this bug;
- Update ResNet-32 experiment results at sparsity 99.5% and 99.9%.
I really appreciate your effort figuring this bug out!
Thank you for answering my question and confirming it with your own experiments. I will look forward to your code update. Thank you.
Hi Acasia,
Sorry for replying late! Here are some updates after thorough checking:
- The codes have been updated. A patch is added to resolve the problem.
- The latest table after correction is as follows:
| ResNet-32 Sparsity | 99% | 99.5% | 99.9% |
|---|---|---|---|
| ProbMask (Official) | 91.79 | 89.34 | 76.87 |
| ProbMask (Our replication) | 91.45 | 88.44 | 76.41 |
| AC/DC (Our replication) | 90.86 | 87.58 | 16.70 |
| RPG (Ours) | 91.61 | 89.13 | 71.09 |
Here are some additional comments:
- We are having difficulty replicating ProbMask CIFAR-10 results. We use the official codes of ProbMask and tried various settings (batchsize=[128,256]; w/ or w/o pretrained weight loading) but we still have a small gap with official results. I guess the gap could be attributed to different device types (we use Nvidia Titan-Xp for CIFAR).
- We also update AC/DC results because it was replicated with the same codebase and the same bug exists.
- RPG performs better than ProbMask on most sparsities except the extreme sparsity of 99.9% on ResNet-32. A hypothesis is that learnable soft mask could avoid weight/channel collapse when very few weights are retained. (It is also notable that ProbMask's training is 20epochs longer than ours because it contains an additional 20epoch finetuning)
- The manuscripts will be updated shortly.
Thanks again for raising this issue! I hope my response helps.