ChipNet Cannot reproduce results in the paper

Please refer to the CIFAR-10, Resnet-50 architecture in the table 2 of paper.

When volume budget is 12.5 %, number of parameter is 2.8% and FLOPs is 5.1% in Table 2.

However, the my reproduced result after pruning step is the number of parameter is 12.9% and FLOPs is 16.9 % while the volume budget is 12.5%.

I follow the hyperparameter setting provided in the appendix.

Would you share train-set validation-set split information and pretrained networks to reproduce the results in the paper??

Aug 12 '21 10:08 maestrojeong

Hi, I believe you are using the visualize_model_architecture function in the utils file to get the parameter/flops ratio. It only shows an approximation in the case of param/flops. We had calculated exact params and flops on pen and paper to write in the paper. However, I have updated the code and you can rerun the code to get accurate numbers.

The train-val split is already given in the data_splits directory.

Our trained model weight for r50 cifar10 volume_ratio 12.5%: https://drive.google.com/file/d/1F4TtIaT0qT76Uz94a-GU0HUrlpZVR7lp/view?usp=sharing.

Please let us know if you have any other queries.

Aug 12 '21 16:08 rishabh-16

I reevaluated my model and found out the number of parameter and the number of FLOPs are comparable with the values in the paper. However, I have another queries in the implementation.

In the code calc_flops(),

ans+=current_max*a[current_loc]*9*size**2 + a[current_loc]*size**2
ans+=a[current_loc]*a[current_loc+1]*9*size**2 + a[current_loc+1]*size**2

It seems you add the FLOPs of convolution operation and FLOPs of batchnorm operation. In get_flops(), you consider only FLOPs of convolution operation.

The result (activated_FLOPs/total_FLOPs) of calc_flops is about (100M/2.4G). On the other hand, the result of get_flops is about (50M/300M). When we ignore the FLOPs of batchnorm operation, the value form cal_flops is double of the value from get_flops since 2*ans. However, total flops ratio of two function is 8 times. Where does this big difference come from?

Also, I have a question in this code

current_max = max(downsample_n, a[current_loc+1])

It seems that you consider the number of activated input channels due to skip addition. However, I believe max operation is not valid. Suppose the activated channels from convolution operation is [1,0,1,0] and the activated channels from skip addtion is [1,1,0,0]. In this regard, the activated number of input channels should be [1,1,1,0] which is three. On the other hand, max operation yield 2 (=max(2,2)). Do I have some misunderstandings in this code?

Thank you!

Aug 14 '21 20:08 maestrojeong

Hi, the get_flops function is just an approximation that we used at pruning time to make the flop calculation. During fine-tuning phase cal_flops is used to get the final thresholds for zeta. Also get_flops uses the soft values of zeta, hence you might be getting these huge difference between the get_flops and cal_flops.

Please refer to branch improved_flops we were currently working on the issue to include correct flop calculation due to skip connection. More elegant way can be found there which you may use. Please do let me know if you have any other queries.

Thanks.

Aug 16 '21 09:08 ubamba98

I found some critical errors in the code.

size = self.insize*2 => size=self.insize

https://github.com/transmuteAI/ChipNet/blob/master/models/resnet.py#L378

This makes the total flop is 20 times bigger than the total flops of get_flops(). You need to revise the number of total FLOPs in the paper.

`self.prev_module' is not correctly considered in the resnet.

Aug 20 '21 22:08 maestrojeong

Thanks for pointing this out. For ResNet-50, our approximation of the FLOPs might be too crude. We will look into it for further clarification and the scores in the repo, if anything changes.

Aug 28 '21 14:08 ubamba98

ChipNet ChipNet copied to clipboard

Cannot reproduce results in the paper

ChipNet
ChipNet copied to clipboard