LLM-Pruner
LLM-Pruner copied to clipboard
[NeurIPS 2023] LLM-Pruner: On the Structural Pruning of Large Language Models. Support Llama-3/3.1, Llama-2, LLaMA, BLOOM, Vicuna, Baichuan, TinyLlama, etc.
I am trying to evaluate the perplexity of Llama-2 13b on WikiText-2. When using the script from [GitHub - yxli2123/LoftQ](https://github.com/yxli2123/LoftQ), I get a perplexity of 12.02. However when using the...
The command I run: ''' python llama3.py --pruning_ratio 0.25 \ --device cuda --eval_device cuda \ --base_model home/Meta-Llama-3-8B \ --block_wise --block_mlp_layer_start 4 --block_mlp_layer_end 30 \ --block_attention_layer_start 4 --block_attention_layer_end 30 \ --save_ckpt_log_name...
Hi there! How to evaluate the PPL of "wikitext2,ptb" with the post-training model?
I'm running post-training on a pruning model. After post-training, I get degraded performance - eg. mmlu goes down to 24%. is this expected? ``` MODEL=meta-llama/Llama-2-7b-hf prune_ckpt_path='llama_prune' tune_ckpt_path='model' RATIO=0.10 # Pruning...
In the readme, a --pruning_ratio 0.25 is used and it's mentioned it prunes 20% of parameters. Why is this? If I want to prune 10%, should I use --pruning_ratio 0.15?
Issue resolved. The problem is that when constructing the trainer, `save_safetensors=False` should be set. Otherwise, the above `safe_serialization=False` will not work. https://huggingface.co/docs/transformers/v4.36.1/en/main_classes/trainer#transformers.TrainingArguments.save_safetensors _Originally posted by @WilliamYi96 in https://github.com/horseee/LLM-Pruner/issues/45#issuecomment-1867980732_ I use...
谢谢分享。 consecutive_groups这个参数怎么理解,Torch-Pruning里中MetaPruner没有该参数。
Hi there, nice work! I've been tinkering with the repo, and came across some issues when trying to fully utilize the available resources. For example, I've learned that there is...
Hi! Is it possible to save the model and create custom configuration files so that we can push to hugging face and load? Also, can PEFT be used directly from...