LLM-Pruner icon indicating copy to clipboard operation
LLM-Pruner copied to clipboard

Pruning Llama2-7B

Open acalatrava opened this issue 1 year ago • 4 comments

I’ve tried to prune Llama2-7B on a MacBook Pro M1 but the system end it by killing the process because of OOM (I’ve 32GB)

Is there something I can do? Did somebody prone this model and publish it?

thank you!

acalatrava avatar Aug 09 '23 22:08 acalatrava

Hi.

The pruning needs around 80G memory if you use the Taylor pruner, since it needs to compute the gradient of the model. If you use other pruners, like L2 or random, it would not require such a large memory consumption. However, the performance of such pruner is not good.

horseee avatar Aug 10 '23 10:08 horseee

@horseee Can I use multiple GPU to prune Llama2-7B? I have 4 A40. hf_prune.py doesn't seem to use multiple GPU. Thank you!

XieWeikai avatar Dec 18 '23 01:12 XieWeikai

@horseee Can I use multiple GPU to prune Llama2-7B? I have 4 A40. hf_prune.py doesn't seem to use multiple GPU. Thank you!

Hi, Did you fix the problem? I alse encountered a similar one.

zhangyao1627-zhang avatar Mar 29 '24 22:03 zhangyao1627-zhang

No answer means no can.

BrownTan avatar Sep 09 '24 09:09 BrownTan