OpenLLM
                                
                                
                                
                                    OpenLLM copied to clipboard
                            
                            
                            
                        feat: GGML model support
Feature request
Being able to use GGML models using ctransformers https://github.com/marella/ctransformers or llama.cpp https://github.com/abetlen/llama-cpp-python
Motivation
CPU support for Starcoder and eventually Falcon models, and overall perf improvements.
Other
No response
I can't seem to run inference on M1 for starcoder and falcon
Starcoder works for me with ctransformers but not llama.cpp. I have some examples here: https://huggingface.co/spaces/matthoffner/starchat-ggml
Llama.cpp has great M1 support with Metal now.
Edit: Falcon is now working with ctransformers
Got it, will look after I finish the fine-tuning API
Will track on the development of #178