LLaMA-Adapter
LLaMA-Adapter copied to clipboard
potential avenues of size reduction.
Question: How does this model respond to pruning? As it is an adapter model, have you attempted reducing the precision then training each layer on an adapter and swapping in the adapters on the needed layers during inference? I can imagine that quantization probably breaks it. If you have tried a training-aware pruning method and a training-aware quantization method separately, you may be able to compare the task-related vectors using the method outlined here: https://arxiv.org/pdf/2212.04089.pdf. This could potentially provide enough knowledge of the shapes to achieve a good level of optimization, compared to retraining from scratch with a sparse method that may or may not achieve the same quality as the original.
I am not a researcher, but if it's okay to ask - what have you tried so far to sparsity it?