wanda this method can be used to bloom model?

Jun 26 '23 07:06 18140663659

It is general for Transformer based large language models. We evaluate on LLaMA mostly in our paper because of its superior performance. We have additional results on Pythia and OPT in the appendix.

In terms of the BLOOM model, do you have a particular model in mind? If so, could you share the Hugging Face model id. I can help look into pruning this model with our approach Wanda.

Jun 26 '23 07:06 Eric-mingjie

It is general for Transformer based large language models. We evaluate on LLaMA mostly in our paper because of its superior performance. We have additional results on Pythia and OPT in the appendix.

In terms of the BLOOM model, do you have a particular model in mind? If so, could you share the Hugging Face model id. I can help look into pruning this model with our approach Wanda.

Thank you for your reply, I would like to know the cropping effects of the following models with different specifications, which are bigscience/bloom-7b1, bigscience/bloom-3b, bigscience/bloom-1b7 in huggingface models

Jun 26 '23 08:06 18140663659

Hi, we have some results on Bloom models, I summarized it here (unstructured 50% sparsity):

BLOOM	560M	1.1B	1.7B	3B	7.1B
dense	22.42	17.68	15.39	13.48	11.37
magnitude	2e10	1e6	2e5	8e6	2e6
sparsegpt	28.92	21.35	18.88	16.76	13.96
wanda	30.74	22.72	19.79	16.45	13.55

Sep 23 '23 00:09 Eric-mingjie