LLM-Pruner icon indicating copy to clipboard operation
LLM-Pruner copied to clipboard

[NeurIPS 2023] LLM-Pruner: On the Structural Pruning of Large Language Models. Support Llama-3/3.1, Llama-2, LLaMA, BLOOM, Vicuna, Baichuan, TinyLlama, etc.

Results 54 LLM-Pruner issues
Sort by recently updated
recently updated
newest added

Thanks for your nice work! When I post-train the pruned model by running `python post_training.py --prune_model prune_log/pytorch_model.bin --data_path yahma/alpaca-cleaned --output_dir tune_log --wandb_project llama_tune --lora_r 8 --num_epochs 2 --learning_rate 1e-4 --batch_size...

If I want to further quantize the pruned model, how should I proceed? I saw this mentioned in the paper

I cut 25% of all the layers, but the cut shape is not I wanne, I hope the shape is [N,N] ,but [N,M] ,the M=N*0.25. it's difficult to load.

bash scripts/llama_prune.sh [START] - Start Pruning Model Traceback (most recent call last): File "/home/azuryl/anaconda3/envs/llamaprune/lib/python3.10/site-packages/huggingface_hub/utils/_errors.py", line 270, in hf_raise_for_status response.raise_for_status() File "/home/azuryl/anaconda3/envs/llamaprune/lib/python3.10/site-packages/requests/models.py", line 1021, in raise_for_status raise HTTPError(http_error_msg, response=self) requests.exceptions.HTTPError: 401...

If I use the multiple strategies such as GPTQ + LLM-Pruner + LoRA, maybe the compressing ratio of LLM will be greatly improved with an acceptable performance?

The pruned model is saved using torch.save and torch.load for loading the model. I was wondering if there is a way to use a similar method such as device_map='auto' similar...

When I choose save model, I found some strange things。The new pytorch.bin is bigger than original model。I choose Baichuan-7B ,--pruning_ratio 0.5 for test, and add --save_model for save the model...

Hi, Great work first! I am confused with the model tuning part. According to the code, it seemed that you used the lora method. This, in my opinion, will destroy...

Hi Is there a chance you can add a tutorial on adapting new models?

在将部分层进行剪枝之后,不能直接通过tgi加载模型,落地难度大,有什么好的idea吗?