SSL
SSL copied to clipboard
Is the initial determination of which filters are unimportant based on the L1 norm of the weights?
Should --prune_criterion l1-norm
be added in ./scripts/dist_train.sh? I noticed that the default prune_criterion is act_scale: parser.add_argument('--prune_criterion', type=str, default='act_scale', choices=['l1-norm', 'act_scale'])
. The entire pruning process is as follows: Firstly, for a well-trained model, the convolutional kernel weights are used to select which layers are planned to be pruned based on their L1 norm. Then, sparse training is performed by incorporating the scaling factors, targeting the scaling factors corresponding to unimportant layers. Finally, pruning is executed to remove those identified layers. I'm not sure if my understanding is accurate.