this method can be used to bloom model?
It is general for Transformer based large language models. We evaluate on LLaMA mostly in our paper because of its superior performance. We have additional results on Pythia and OPT in the appendix.
In terms of the BLOOM model, do you have a particular model in mind? If so, could you share the Hugging Face model id. I can help look into pruning this model with our approach Wanda.
It is general for Transformer based large language models. We evaluate on LLaMA mostly in our paper because of its superior performance. We have additional results on Pythia and OPT in the appendix.
In terms of the BLOOM model, do you have a particular model in mind? If so, could you share the Hugging Face model id. I can help look into pruning this model with our approach Wanda.
Thank you for your reply, I would like to know the cropping effects of the following models with different specifications, which are bigscience/bloom-7b1, bigscience/bloom-3b, bigscience/bloom-1b7 in huggingface models
Hi, we have some results on Bloom models, I summarized it here (unstructured 50% sparsity):
| BLOOM | 560M | 1.1B | 1.7B | 3B | 7.1B |
|---|---|---|---|---|---|
| dense | 22.42 | 17.68 | 15.39 | 13.48 | 11.37 |
| magnitude | 2e10 | 1e6 | 2e5 | 8e6 | 2e6 |
| sparsegpt | 28.92 | 21.35 | 18.88 | 16.76 | 13.96 |
| wanda | 30.74 | 22.72 | 19.79 | 16.45 | 13.55 |