LLM-Pruner Pruning Llama2-7B

Pruning Llama2-7B

Open acalatrava opened this issue 1 year ago • 4 comments

I’ve tried to prune Llama2-7B on a MacBook Pro M1 but the system end it by killing the process because of OOM (I’ve 32GB)

Is there something I can do? Did somebody prone this model and publish it?

thank you!

Aug 09 '23 22:08 acalatrava

Hi.

The pruning needs around 80G memory if you use the Taylor pruner, since it needs to compute the gradient of the model. If you use other pruners, like L2 or random, it would not require such a large memory consumption. However, the performance of such pruner is not good.

Aug 10 '23 10:08 horseee

@horseee Can I use multiple GPU to prune Llama2-7B? I have 4 A40. hf_prune.py doesn't seem to use multiple GPU. Thank you!

Dec 18 '23 01:12 XieWeikai

@horseee Can I use multiple GPU to prune Llama2-7B? I have 4 A40. hf_prune.py doesn't seem to use multiple GPU. Thank you!

Hi, Did you fix the problem? I alse encountered a similar one.

Mar 29 '24 22:03 zhangyao1627-zhang

No answer means no can.

Sep 09 '24 09:09 BrownTan

LLM-Pruner LLM-Pruner copied to clipboard

Pruning Llama2-7B

LLM-Pruner
LLM-Pruner copied to clipboard