torchchat Support for quantized llm for smaller memory devices

Support for quantized llm for smaller memory devices

Open jhetuts opened this issue 1 year ago • 1 comments

🚀 The feature, motivation and pitch

I believe, this is what ollama's one huge advantage. This can also encourage devs to go test llm which they can run on their machine's capabilities. Like me, I have my m1 16gb, I cannot really enjoy testing the meta llms, especially the llama3.1

Alternatives

No response

Additional context

No response

RFC (Optional)

No response

Aug 09 '24 14:08 jhetuts

Hi @jhetuts thanks for providing feedback! Can you give a specific example on what quantized llm you are referring to? For llama3.1 we have a few quantization options and you can refer to this readme. m1 16gb should be able to run llama3.1 8B model with quantization.

Aug 09 '24 17:08 larryliu0820

Got it @larryliu0820 , I've been testing with this. Thanks!

Sep 05 '24 05:09 jhetuts

torchchat torchchat copied to clipboard

Support for quantized llm for smaller memory devices

🚀 The feature, motivation and pitch

Alternatives

Additional context

RFC (Optional)

torchchat
torchchat copied to clipboard