localGPT
localGPT copied to clipboard
Support Quantized GGML models for CPU and MPS
Hey, is it worth it to support GGML quantized models for cpu and mps?
I've done some testing of TheBloke's GGML models. Most of them are supported by llama-cpp, which integrates with langchain.
They work with cpu and mps for mac M1/M2.
llama-cpp-python support: https://github.com/abetlen/llama-cpp-python#installation-with-openblas--cublas--clblast--metal model card: https://huggingface.co/TheBloke/vicuna-13b-v1.3.0-GGML
@imjwang thanks for adding this. I was going to do it. I will test your PR.
I agree, there are some GGML quantized models I want to have a try
@PromtEngineer I accede to the request to support quantized GGML models
@imjwang @r0103 @Boguslaw-D this has been merged. @imjwang thank you for the PR.