DeepSpeed Add snip_momentum structured pruning which supports higher sparse ratio

This PR is used to contribute snip_momentum pruning algorithm in Intel Neural Compress to DeepSpeed compression like we proposed in RFC.

The snip_momentum algo implements the algorithm described in here.

We tested it on DeepSpeedExamples/compression/bert with a newly added script bash_script/pruning_sparse_snip_momentum.sh and get below results. The changes in examples is here

pattern	sparsity ratio	pruning method	epochs	acc & mm-acc
1x1	80%	DeepSpeed L1	2	0.8113/0.822
1x1	80%	Snip_momentum	2	0.8176/0.822
4x1	80%	snip_momentum	10	0.8248/0.8305

cc @hshen14 @wenhuach21

Apr 19 '23 03:04 ftian1

@microsoft-github-policy-service agree company="Intel"

Apr 19 '23 04:04 ftian1

Due to different algorithms may not share the same best hyperparameter, we have tried others. The main difference is we only use the second to last layer for distillation and change the lr.

pattern	sparsity ratio	pruning method	epochs	acc & mm-acc
4x1	80%	Snip_momentum	2	0.8284/0.8388
4x1	80%	Snip_momentum	6	0.8339/0.8418

Apr 19 '23 04:04 wenhuach21

tested the accuracy and looks great.

May 08 '23 16:05 xiaoxiawu-microsoft

@ftian1, there is a formatting issue on the PR. The pre-commit needs to be run and the file changes committed to the branch. In particular, the following needs to be run on the repo:

pre-commit run --all-files

Contributing - DeepSpeed

May 08 '23 17:05 xiaoxiawu-microsoft

@xiaoxiawu-microsoft sorry for the late response due to PRC holiday and thanks for your review.

I have fixed the yapf scan issue. but in my local, the detection of destroyed symlinks always fail after merge master. not sure why it happens as everything looks good. so I push the code at first. Hope it will not waste pre-ci resources.

May 09 '23 07:05 ftian1

@xiaoxiawu-microsoft Those pre-ci errors are not related with my changes, could you pls have a check?

May 10 '23 05:05 ftian1