LLM-Pruner issues

a post-training issue

2

Thanks for your nice work! When I post-train the pruned model by running `python post_training.py --prune_model prune_log/pytorch_model.bin --data_path yahma/alpaca-cleaned --output_dir tune_log --wandb_project llama_tune --lora_r 8 --num_epochs 2 --learning_rate 1e-4 --batch_size...

cmnfriend

The quantization of the compressed models

If I want to further quantize the pruned model, how should I proceed? I saw this mentioned in the paper

lihuang258

Cannot use huggface to load

I cut 25% of all the layers, but the cut shape is not I wanne, I hope the shape is [N,N] ,but [N,M] ,the M=N*0.25. it's difficult to load.

coderchem

401 Client Error: Unauthorized for url: https://huggingface.co/decapoda-research/llama-7b-hf/resolve/main/tokenizer_config.json

1

bash scripts/llama_prune.sh [START] - Start Pruning Model Traceback (most recent call last): File "/home/azuryl/anaconda3/envs/llamaprune/lib/python3.10/site-packages/huggingface_hub/utils/_errors.py", line 270, in hf_raise_for_status response.raise_for_status() File "/home/azuryl/anaconda3/envs/llamaprune/lib/python3.10/site-packages/requests/models.py", line 1021, in raise_for_status raise HTTPError(http_error_msg, response=self) requests.exceptions.HTTPError: 401...

azuryl

Adding quantization

9

If I use the multiple strategies such as GPTQ + LLM-Pruner + LoRA, maybe the compressing ratio of LLM will be greatly improved with an acceptable performance?

Duncan1115

Supporting device_map = 'auto' similar to the one in .from_pretrained method from Huggingface

3

The pruned model is saved using torch.save and torch.load for loading the model. I was wondering if there is a way to use a similar method such as device_map='auto' similar...

Ahmed-Roushdy

the new pytorch.bin is bigger than original model issue

4

When I choose save model, I found some strange things。The new pytorch.bin is bigger than original model。I choose Baichuan-7B ,--pruning_ratio 0.5 for test, and add --save_model for save the model...

lb553024300

Question related to the model tuning

2

Hi, Great work first! I am confused with the model tuning part. According to the code, it seemed that you used the lora method. This, in my opinion, will destroy...

shawnricecake

Adding a tutorial for adapting new models?

Hi Is there a chance you can add a tutorial on adapting new models?

zhichaoxu-shufe

在将部分层进行剪枝之后，不能直接通过tgi加载模型

在将部分层进行剪枝之后，不能直接通过tgi加载模型，落地难度大，有什么好的idea吗？

coderchem

LLM-Pruner
LLM-Pruner copied to clipboard

Metadata

a post-training issue

The quantization of the compressed models

Cannot use huggface to load

401 Client Error: Unauthorized for url: https://huggingface.co/decapoda-research/llama-7b-hf/resolve/main/tokenizer_config.json

Adding quantization

Supporting device_map = 'auto' similar to the one in .from_pretrained method from Huggingface

the new pytorch.bin is bigger than original model issue

Question related to the model tuning

Adding a tutorial for adapting new models?

在将部分层进行剪枝之后，不能直接通过tgi加载模型

← Metadata

Owner

Metadata

LLM-Pruner LLM-Pruner copied to clipboard

Metadata

← Metadata

Owner

Metadata

LLM-Pruner
LLM-Pruner copied to clipboard