Ma Xinyin comments

Results 58 comments of


                                            Ma Xinyin

Reproducing paper results

Have you tried the copied version of `decapoda-research/llama-7b-hf`, e.g., https://huggingface.co/baffo32/decapoda-research-llama-7B-hf? We would try that kind of checkpoint these days to see if the results are reproducible in those available checkpoints.

Reproducing paper results

I have no idea about this😢. I guess the possible reasons may be: (1) the EOS token issue or (2) the weight between these two is slightly different.

Checking the pruned but uncompressed model

Does that mean that you want to use those weights that have already been pruned?

Checking the pruned but uncompressed model

You can get the position of pruned parameters by replacing [Line 150](https://github.com/horseee/LLM-Pruner/blob/1455fa9646bc8b87ccbc613cf1b97e5729e06152/hf_prune.py#L150) in hf_prune.py by: ``` for group in pruner.step(interactive=True): print(group.details()) group.prune() ``` And the location of the pruned weights...

Checking the pruned but uncompressed model

Not exactly. There's just a minor detail that needs to be corrected. Let's take this example: the down_proj Linear layer has in_features=11008 and out_features=4096, which in PyTorch, would create a...

Checking the pruned but uncompressed model

> I guess I only need to care about things after "=>", right? That's correct. The left side of '=>' serves as the trigger, and the pruning process only affects...

Checking the pruned but uncompressed model

Hi. I'm not sure how you set the 50% sparsity. If you set the pruning ratio to 50% in the command, there are several factors that would cause the parameters...

Adding quantization

> I assume the correct way to do it would go something like: 0. (optional) Increase size and topic breadth of LLM-Pruner Corpus > > 1. LLM-Pruner > 2. LoRA/QLoRa...

Adding quantization

> @horseee Hi, may I ask you why you don't compare the results bewteen pure quantizing and pure pruning in the paper? Hi. Quantization is orthogonal to pruning and hence...

Adding quantization