root_instances LLMPruner equivalent on TorchPruning

Open Cyber-Vadok opened this issue 9 months ago • 0 comments

Pruning LLMs using prune_llm.py will impact performance severely because it prunes all the layers (I think...). Instead, in LLMPruner there's the option to "flag" the layers you're going to prune with the "root_instances" argument, in this way you can prune only the "intermediate" structure of the LLM, retaining performance after fine tuning with LoRa. I'm using first order taylor pruning in both cases.

Is there a way to get the "root_instances" behaviour from LLMPruner in TorchPruning?

Currently I'm trying to use the "ignored_layers" argument with the goal of skipping for example the pruning of the first and last 4 layers of llama-3.1-8B, i'm getting size mismatch errors during inference. I'm still experimenting.

Mar 26 '25 10:03 Cyber-Vadok