Roberto
Roberto
have the same doubt
I have the same problem, I think that we should try not pruning the first 3-5 layers and the last 3-5 layers. I'm trying...
> [@Cyber-Vadok](https://github.com/Cyber-Vadok) I think will have size mismatch problem when load the model after pruning? but we can try! You are right! In LLMPruner there's the ["root_instances" argument](https://github.com/horseee/LLM-Pruner/blob/128a07d977f9b205d60ab14cfbc6a78f8a8e39d2/llama3.py#L114C1-L115C137) and it...
I have the same problem with [Qwen2.5-7B](https://huggingface.co/Qwen/Qwen2.5-7B)
as suggested [here](https://github.com/horseee/LLM-Pruner/issues/93) `Add model.config.use_cache = False`
> After adding this `model.config.use_cache = False`, how long does it take to prun the model? Since cache is used to speed up the process, would it cost so many...
I think the problem is that after pruning the shape of your model have changed. My guess come from [Modify static attributes or forward functions](https://github.com/VainF/Torch-Pruning/tree/master?tab=readme-ov-file#modify-static-attributes-or-forward-functions) in the readme.