LLM-Pruner
LLM-Pruner copied to clipboard
[NeurIPS 2023] LLM-Pruner: On the Structural Pruning of Large Language Models. Support Llama-3/3.1, Llama-2, LLaMA, BLOOM, Vicuna, Baichuan, TinyLlama, etc.
How to pruning LLMs with Multi Query Attention?
I run LLM pruner with the command specified in the ReadMe to prune LLama-7B ```bash python hf_prune.py --pruning_ratio 0.25 \ --block_wise \ --block_mlp_layer_start 4 --block_mlp_layer_end 30 \ --block_attention_layer_start 4 --block_attention_layer_end...
Apologies if this has been asked before, but do you have pruned models that we can test and run locally? Anything on the huggingface hub? I'd like to test some...
Is there a way to force the pruning to remove the same amount of parameters from all layers? This would make the resulting model compatible with hf implementation (use from_pretrained)
I simply use the following commands to run: `python hf_prune.py --pruning_ratio 0.62785 --block_wise --block_mlp_layer_start 0 --block_mlp_layer_end 32 --block_attention_layer_start 32 --block_attention_layer_end 32 --pruner_type taylor --base_model /mnt/petrelfs/xxx/llama2-7b --device cpu --eval_device cuda --taylor...
Traceback (most recent call last): File "/home/jovyan/honor/yangdong/LLM-Pruner-main/examples/baichuan.py", line 342, in main(args) File "/home/jovyan/honor/yangdong/LLM-Pruner-main/examples/baichuan.py", line 229, in main pruner.step() File "/home/jovyan/honor/yangdong/LLM-Pruner-main/LLMPruner/torch_pruning/pruner/algorithms/metapruner.py", line 186, in step for group in self.prune_local(): File "/home/jovyan/honor/yangdong/LLM-Pruner-main/LLMPruner/torch_pruning/pruner/algorithms/metapruner.py",...
evaluate
Hello, After pruning, use [alpaca_data_zh_51k](https://github.com/ymcui/Chinese-LLaMA-Alpaca/blob/main/data/alpaca_data_zh_51k.json) Data set fine-tuning, How to evaluate the performance of the model on the alpaca_data_zh_51k after fine-tuning? Thanks.