llama.cpp
llama.cpp copied to clipboard
python bindings?
Python Bindings for llama.cpp: https://pypi.org/project/llamacpp/0.1.3/ (not mine, just found them)
As a temporary work-around before an "official" binding is available, I've written a quick script to call the llama.cpp executable that supports streaming and interactive mode: https://github.com/shaunabanana/llama.py
Python Bindings for llama.cpp: https://pypi.org/project/llamacpp/0.1.3/ (not mine, just found them)
looks promising on description, will try try and feedback
sweet will give it a shot
On Wed, Mar 15, 2023 at 9:21 AM, aratic < @.*** > wrote:
Python Bindings for llama.cpp: https:/ / pypi. org/ project/ llamacpp/ 0. 1. 3/ ( https://pypi.org/project/llamacpp/0.1.3/ ) (not mine, just found them)
looks promising on description, will try try and feedback
— Reply to this email directly, view it on GitHub ( https://github.com/ggerganov/llama.cpp/issues/82#issuecomment-1470344972 ) , or unsubscribe ( https://github.com/notifications/unsubscribe-auth/AXMEJMCMQRUJKKEAIAHOJXLW4HT77ANCNFSM6AAAAAAVYVLGQQ ). You are receiving this because you authored the thread. Message ID: <ggerganov/llama . cpp/issues/82/1470344972 @ github. com>
I hacked something together tonight on this. Its python-cpp bindings for the model directly (allowing you to call model.generate()
from python and getting the string in returned. It doesn't support setting parameters within python yet (working on it), but it is model agnostic so you can load whatever ggml supports.
Merging it would however require splitting some parts of the code out of main.cpp which @ggerganov has argued against IIRc.
https://github.com/seemanne/llamacpypy
@seemanne u did what I want, dude, easier to expose or integrate with web or chat, appreciate and will give it a try
Ok I updated this and put it into a proper fork. You can now pass parameters in python. I will need to do some refactoring to pull upstream changes each time but it should work and i tested it on linux and mac.
I wrote my own ctypes bindings and wrapped it an a KoboldAI compatible REST API.
https://github.com/LostRuins/llamacpp-for-kobold
EDIT: I've adapted the single-file bindings into a pip-installable package (will build llama.cpp on install) called llama-cpp-python
If anyone's just looking for python bindings I put together llama.py
which uses ctypes to expose the current C API.
To use it you have to first build llama.cpp
as a shared library and then put the shared library in the same directory as the llama.py file.
On Linux for example, to build the shared library, update the Makefile
to add a new target for libllama.so
libllama.so: llama.o ggml.o
$(CXX) $(CXXFLAGS) -shared -fPIC -o libllama.so llama.o ggml.o $(LDFLAGS)
Then run make libllama.so
to generate the library.
We are putting together a Huggingface-like library with python interface that auto-downloads pre-compressed models at https://github.com/NolanoOrg/cformers/#usage Please let us know what features and models would you us to add.
I also found these bindings https://github.com/PotatoSpudowski/fastLLaMa
Some feature suggestions, mostly about low level capabilities:
- Accessing the output classifier activations from python, enabling sampling and quantitative evaluation from python,
- Managing k/v state with its own python object, allowing to swap them in and out.
- Array view on embedding and the possibility to bypass
ggml_get_rows
for feeding back embedding crafted by hand.
EDIT: I've adapted the single-file bindings into a pip-installable package (will build llama.cpp on install) called llama-cpp-python
If anyone's just looking for python bindings I put together
llama.py
which uses ctypes to expose the current C API.To use it you have to first build
llama.cpp
as a shared library and then put the shared library in the same directory as the llama.py file.On Linux for example, to build the shared library, update the
Makefile
to add a new target forlibllama.so
libllama.so: llama.o ggml.o $(CXX) $(CXXFLAGS) -shared -fPIC -o libllama.so llama.o ggml.o $(LDFLAGS)
Then run
make libllama.so
to generate the library.
Having issues with both variants on a M1 Mac: from llama_cpp import Llama produces this error:
zsh: illegal hardware instruction
The python bindings approach (after building the shared library) produces: libllama.so' (mach-o file, but is an incompatible architecture (have 'arm64', need 'x86_64'))
I also found these bindings https://github.com/PotatoSpudowski/fastLLaMa
Some feature suggestions, mostly about low level capabilities:
- Accessing the output classifier activations from python, enabling sampling and quantitative evaluation from python,
- Managing k/v state with its own python object, allowing to swap them in and out.
- Array view on embedding and the possibility to bypass
ggml_get_rows
for feeding back embedding crafted by hand.
We have added most of these suggestion in the latest fastLLaMa update 👀
As #1156 is closed as a duplicate of this issue, I am bringing the discussion here about the creation of an official python binding in the llama.cpp repository (which I now assume the the objective of this issue.)
The current external python bindings seem to be:
- llama-cpp-python
- llamacpp
- pyllamacpp
- llamacpypy
- fastllama
But none really stand out as a candidate to be merged into llama.cpp.
My proposal is to model the llama.cpp bindings after rwkv.cpp by @saharNooby (bert.cpp also follows a similar path).
- Assume llama.cpp is built as shared library (built with BUILD_SHARED_LIBS=ON )
- Create basic python bindings that just expose functions in the shared library as is
- (optional) Create a higher level model that that builds on the basic bindings
- Change the examples to be written in python, rather than bash
We could keep the following in mind for the basic binding:
- completeness - should be a complete binding, aligning to the llama.cpp interface
- simplicity - relatively straight-forward, easy to understand implementation
- easily to maintain
Any suggestion on which of the current external python bindings could be considered a good start for eventual merge into llama.cpp?
If anyone's just looking for python bindings I put together llama.py which uses ctypes to expose the current C API.
@abetlen , could this single file implementation be a starting point for a basic binding mentioned above?
Hey @dmahurin w.r.t your proposal I should point out that what you describe is the current state of llama-cpp-python
- Builds llama.cpp as a shared library with support for all the llama.cpp build flags for OpenBLAS, CUDA, CLBLAST
- Exposes the entire llama.h api as-is via ctypes
- Exposes a higher-level api that handles type conversions, memory management, etc
- Includes examples for both APIs in python
That being said I don't have anything against moving these bindings to llama.cpp if that's something the maintainers think is worthwhile / the right approach. I would also be happy to transfer over the PyPI package as long as we don't break downstream users (text-generation-webui, langchain, babyagi, etc).
@dmahurin I don't see how merging python bindings into this repo is needed when solutions like the repo of @abetlen exist already. Putting the maintenance burden of a mainly python library on mainly cpp developers just so bash can be removed from the readme seems unwise. The python bindings are already linked in the readme, those who want them will find them.
Hi @seemanne, the purpose is not to replace bash. The purpose is to widen the development community. Like it or not, Python is a very common language in AI development.
I do not think having supported python code would put any burden on cpp developers. Again, reference rwkv.cpp and bert.cpp. The python support in rwkv.cpp for example comes in the form of two python files.
As mentioned, there are 5 independent python bindings for llama.cpp. Unifying at least the base python binding would help to focus related python llama.cpp development.
@abetlen, perhaps you saw that I created pull request #1660 to add low level python bindings from llama-cpp-python.
The PR puts llama_cpp.py and low level examples in the examples/ folder.
There was a bit of filtering and some squashing to get clean history for the low level commits. For now I excluded the multi char change, mainly because it created a dependency on another file, util.py. (and the change looks more complex than I would expect).
Any comments on the approach of the PR?