Pruned model is same size as original
Great work on the project, really excited to see the outcomes.
However, After running the script below, the pruned model (output) seems to be of the same size as the original one (which is 6.38G)
!python /content/wanda/main.py
--model openlm-research/open_llama_3b_v2
--prune_method wanda
--sparsity_ratio 0.5
--sparsity_type unstructured
--save_model out/pruned
--save out/open_llama_3b_v2/unstructured/wanda/
Is this correct, or am I missing something?!
Yes, this is correct and it has been true for unstructured pruning. To my understanding, unstructured sparsity won't save the memory footprint on modern GPU devices.
Thanks, any pruning options for reducing memory footprint?!