BitNet
BitNet copied to clipboard
How to build llama.cpp python bindings
Hello The readMe advises to Launch setup_env python script, that will recompile llama-cpp-cli exécutable. I want to easily integrate the optimized new llama-cpp in a gradio or streamlit python app: how to proceed ? (As i can’t use Transformers python api since performances are lower, as explained from Hugging Face page). Do I need to recompile llama-cpp (and then python bindings ), and if yes, how to benefit from all the 1-bit (bitNet) of this repo ? Thanks.
you can pull latest llama.cpp and merge our changes and build by yourself.