mlc-llm
mlc-llm copied to clipboard
[Question] Deployment of Pruned Models
❓ General Questions
Hi there,
I just want to ask that for the pruned model, how can we deploy it using MLC-LLM? Since the qkv dimensions in each layer are different, the model is stored using torch.save rather than save_pretrained. So I'm a little confused about how to use MLC-LLM with this model? Could you please give me some tips or advice?
Thanks!