root_instances LLMPruner equivalent on TorchPruning
Pruning LLMs using prune_llm.py will impact performance severely because it prunes all the layers (I think...). Instead, in LLMPruner there's the option to "flag" the layers you're going to prune with the "root_instances" argument, in this way you can prune only the "intermediate" structure of the LLM, retaining performance after fine tuning with LoRa. I'm using first order taylor pruning in both cases.
Is there a way to get the "root_instances" behaviour from LLMPruner in TorchPruning?
Currently I'm trying to use the "ignored_layers" argument with the goal of skipping for example the pruning of the first and last 4 layers of llama-3.1-8B, i'm getting size mismatch errors during inference. I'm still experimenting.