ai-hub-models icon indicating copy to clipboard operation
ai-hub-models copied to clipboard

How to quantize LLM to INT4?

Open YixinSong-e opened this issue 11 months ago • 4 comments

I want to quantize my llama-finetuned model to INT4 and deploy it on my 8 gen3 device. But I don't know how to do it. So when will we have a tutorial?

YixinSong-e avatar Mar 23 '24 08:03 YixinSong-e

@bhushan23 mentioned this in slack. We are actively working on providing sample recipes and looking at int4. When we have a tutorial, we'll post it in Slack.

mestrona-3 avatar Mar 25 '24 16:03 mestrona-3

@mestrona-3 I hope to get it soon. when do you plan to release the tutorial ?

Junhyuk avatar Apr 01 '24 02:04 Junhyuk

Hi @Junhyuk it is on our roadmap for the next 4-6 weeks, I'll circle back here, and on slack when it is ready!

mestrona-3 avatar Apr 05 '24 17:04 mestrona-3

Hi @mestrona-3 Thanks for your update. I will focus continuously for this upate.

Junhyuk avatar Apr 06 '24 07:04 Junhyuk

Hi @Junhyuk @YixinSong-e Llama2 export scripts are out now https://github.com/quic/ai-hub-models/tree/main/qai_hub_models/models/llama_v2_7b_chat_quantized

Please give it a try and let us know how it goes

bhushan23 avatar May 29 '24 15:05 bhushan23