localGPT icon indicating copy to clipboard operation
localGPT copied to clipboard

Support Quantized GGML models for CPU and MPS

Open imjwang opened this issue 1 year ago • 2 comments

Hey, is it worth it to support GGML quantized models for cpu and mps?

I've done some testing of TheBloke's GGML models. Most of them are supported by llama-cpp, which integrates with langchain.

They work with cpu and mps for mac M1/M2.

llama-cpp-python support: https://github.com/abetlen/llama-cpp-python#installation-with-openblas--cublas--clblast--metal model card: https://huggingface.co/TheBloke/vicuna-13b-v1.3.0-GGML

imjwang avatar Jun 28 '23 21:06 imjwang

@imjwang thanks for adding this. I was going to do it. I will test your PR.

PromtEngineer avatar Jun 29 '23 03:06 PromtEngineer

I agree, there are some GGML quantized models I want to have a try

r0103 avatar Jun 29 '23 11:06 r0103

@PromtEngineer I accede to the request to support quantized GGML models

Boguslaw-D avatar Jul 04 '23 12:07 Boguslaw-D

@imjwang @r0103 @Boguslaw-D this has been merged. @imjwang thank you for the PR.

PromtEngineer avatar Jul 04 '23 20:07 PromtEngineer