torchchat
torchchat copied to clipboard
Support for quantized llm for smaller memory devices
🚀 The feature, motivation and pitch
I believe, this is what ollama's one huge advantage. This can also encourage devs to go test llm which they can run on their machine's capabilities. Like me, I have my m1 16gb, I cannot really enjoy testing the meta llms, especially the llama3.1
Alternatives
No response
Additional context
No response
RFC (Optional)
No response