OpenLLM icon indicating copy to clipboard operation
OpenLLM copied to clipboard

feat: GGML model support

Open matthoffner opened this issue 2 years ago • 4 comments

Feature request

Being able to use GGML models using ctransformers https://github.com/marella/ctransformers or llama.cpp https://github.com/abetlen/llama-cpp-python

Motivation

CPU support for Starcoder and eventually Falcon models, and overall perf improvements.

Other

No response

matthoffner avatar Jun 15 '23 01:06 matthoffner

I can't seem to run inference on M1 for starcoder and falcon

aarnphm avatar Jun 15 '23 01:06 aarnphm

Starcoder works for me with ctransformers but not llama.cpp. I have some examples here: https://huggingface.co/spaces/matthoffner/starchat-ggml

Llama.cpp has great M1 support with Metal now.

Edit: Falcon is now working with ctransformers

matthoffner avatar Jun 15 '23 02:06 matthoffner

Got it, will look after I finish the fine-tuning API

aarnphm avatar Jun 15 '23 04:06 aarnphm

Will track on the development of #178

aarnphm avatar Aug 18 '23 02:08 aarnphm