llama.cpp icon indicating copy to clipboard operation
llama.cpp copied to clipboard

python bindings?

Open bryanhpchiang opened this issue 1 year ago • 19 comments

bryanhpchiang avatar Mar 13 '23 07:03 bryanhpchiang

Python Bindings for llama.cpp: https://pypi.org/project/llamacpp/0.1.3/ (not mine, just found them)

MarkSchmidty avatar Mar 14 '23 05:03 MarkSchmidty

As a temporary work-around before an "official" binding is available, I've written a quick script to call the llama.cpp executable that supports streaming and interactive mode: https://github.com/shaunabanana/llama.py

shaunabanana avatar Mar 15 '23 13:03 shaunabanana

Python Bindings for llama.cpp: https://pypi.org/project/llamacpp/0.1.3/ (not mine, just found them)

looks promising on description, will try try and feedback

aratic avatar Mar 15 '23 16:03 aratic

sweet will give it a shot

On Wed, Mar 15, 2023 at 9:21 AM, aratic < @.*** > wrote:

Python Bindings for llama.cpp: https:/ / pypi. org/ project/ llamacpp/ 0. 1. 3/ ( https://pypi.org/project/llamacpp/0.1.3/ ) (not mine, just found them)

looks promising on description, will try try and feedback

— Reply to this email directly, view it on GitHub ( https://github.com/ggerganov/llama.cpp/issues/82#issuecomment-1470344972 ) , or unsubscribe ( https://github.com/notifications/unsubscribe-auth/AXMEJMCMQRUJKKEAIAHOJXLW4HT77ANCNFSM6AAAAAAVYVLGQQ ). You are receiving this because you authored the thread. Message ID: <ggerganov/llama . cpp/issues/82/1470344972 @ github. com>

bryanhpchiang avatar Mar 15 '23 20:03 bryanhpchiang

I hacked something together tonight on this. Its python-cpp bindings for the model directly (allowing you to call model.generate() from python and getting the string in returned. It doesn't support setting parameters within python yet (working on it), but it is model agnostic so you can load whatever ggml supports.

Merging it would however require splitting some parts of the code out of main.cpp which @ggerganov has argued against IIRc.

https://github.com/seemanne/llamacpypy

seemanne avatar Mar 17 '23 21:03 seemanne

@seemanne u did what I want, dude, easier to expose or integrate with web or chat, appreciate and will give it a try

aratic avatar Mar 18 '23 05:03 aratic

Ok I updated this and put it into a proper fork. You can now pass parameters in python. I will need to do some refactoring to pull upstream changes each time but it should work and i tested it on linux and mac.

seemanne avatar Mar 18 '23 12:03 seemanne

I wrote my own ctypes bindings and wrapped it an a KoboldAI compatible REST API.

https://github.com/LostRuins/llamacpp-for-kobold

LostRuins avatar Mar 18 '23 16:03 LostRuins

EDIT: I've adapted the single-file bindings into a pip-installable package (will build llama.cpp on install) called llama-cpp-python

If anyone's just looking for python bindings I put together llama.py which uses ctypes to expose the current C API.

To use it you have to first build llama.cpp as a shared library and then put the shared library in the same directory as the llama.py file.

On Linux for example, to build the shared library, update the Makefile to add a new target for libllama.so

libllama.so: llama.o ggml.o
	$(CXX) $(CXXFLAGS) -shared -fPIC -o libllama.so llama.o ggml.o $(LDFLAGS)

Then run make libllama.so to generate the library.

abetlen avatar Mar 22 '23 11:03 abetlen

We are putting together a Huggingface-like library with python interface that auto-downloads pre-compressed models at https://github.com/NolanoOrg/cformers/#usage Please let us know what features and models would you us to add.

Ayushk4 avatar Mar 22 '23 20:03 Ayushk4

I also found these bindings https://github.com/PotatoSpudowski/fastLLaMa

Some feature suggestions, mostly about low level capabilities:

  • Accessing the output classifier activations from python, enabling sampling and quantitative evaluation from python,
  • Managing k/v state with its own python object, allowing to swap them in and out.
  • Array view on embedding and the possibility to bypass ggml_get_rows for feeding back embedding crafted by hand.

Piezoid avatar Mar 22 '23 21:03 Piezoid

EDIT: I've adapted the single-file bindings into a pip-installable package (will build llama.cpp on install) called llama-cpp-python

If anyone's just looking for python bindings I put together llama.py which uses ctypes to expose the current C API.

To use it you have to first build llama.cpp as a shared library and then put the shared library in the same directory as the llama.py file.

On Linux for example, to build the shared library, update the Makefile to add a new target for libllama.so

libllama.so: llama.o ggml.o
	$(CXX) $(CXXFLAGS) -shared -fPIC -o libllama.so llama.o ggml.o $(LDFLAGS)

Then run make libllama.so to generate the library.

Having issues with both variants on a M1 Mac: from llama_cpp import Llama produces this error:

zsh: illegal hardware instruction

The python bindings approach (after building the shared library) produces: libllama.so' (mach-o file, but is an incompatible architecture (have 'arm64', need 'x86_64'))

DrBenjamin avatar Mar 27 '23 08:03 DrBenjamin

I also found these bindings https://github.com/PotatoSpudowski/fastLLaMa

Some feature suggestions, mostly about low level capabilities:

  • Accessing the output classifier activations from python, enabling sampling and quantitative evaluation from python,
  • Managing k/v state with its own python object, allowing to swap them in and out.
  • Array view on embedding and the possibility to bypass ggml_get_rows for feeding back embedding crafted by hand.

We have added most of these suggestion in the latest fastLLaMa update 👀

PotatoSpudowski avatar Apr 17 '23 17:04 PotatoSpudowski

As #1156 is closed as a duplicate of this issue, I am bringing the discussion here about the creation of an official python binding in the llama.cpp repository (which I now assume the the objective of this issue.)

The current external python bindings seem to be:

  • llama-cpp-python
  • llamacpp
  • pyllamacpp
  • llamacpypy
  • fastllama

But none really stand out as a candidate to be merged into llama.cpp.

My proposal is to model the llama.cpp bindings after rwkv.cpp by @saharNooby (bert.cpp also follows a similar path).

  • Assume llama.cpp is built as shared library (built with BUILD_SHARED_LIBS=ON )
  • Create basic python bindings that just expose functions in the shared library as is
  • (optional) Create a higher level model that that builds on the basic bindings
  • Change the examples to be written in python, rather than bash

We could keep the following in mind for the basic binding:

  • completeness - should be a complete binding, aligning to the llama.cpp interface
  • simplicity - relatively straight-forward, easy to understand implementation
  • easily to maintain

Any suggestion on which of the current external python bindings could be considered a good start for eventual merge into llama.cpp?

dmahurin avatar May 17 '23 12:05 dmahurin

If anyone's just looking for python bindings I put together llama.py which uses ctypes to expose the current C API.

@abetlen , could this single file implementation be a starting point for a basic binding mentioned above?

dmahurin avatar May 17 '23 12:05 dmahurin

Hey @dmahurin w.r.t your proposal I should point out that what you describe is the current state of llama-cpp-python

  • Builds llama.cpp as a shared library with support for all the llama.cpp build flags for OpenBLAS, CUDA, CLBLAST
  • Exposes the entire llama.h api as-is via ctypes
  • Exposes a higher-level api that handles type conversions, memory management, etc
  • Includes examples for both APIs in python

That being said I don't have anything against moving these bindings to llama.cpp if that's something the maintainers think is worthwhile / the right approach. I would also be happy to transfer over the PyPI package as long as we don't break downstream users (text-generation-webui, langchain, babyagi, etc).

abetlen avatar May 17 '23 16:05 abetlen

@dmahurin I don't see how merging python bindings into this repo is needed when solutions like the repo of @abetlen exist already. Putting the maintenance burden of a mainly python library on mainly cpp developers just so bash can be removed from the readme seems unwise. The python bindings are already linked in the readme, those who want them will find them.

seemanne avatar May 17 '23 16:05 seemanne

Hi @seemanne, the purpose is not to replace bash. The purpose is to widen the development community. Like it or not, Python is a very common language in AI development.

I do not think having supported python code would put any burden on cpp developers. Again, reference rwkv.cpp and bert.cpp. The python support in rwkv.cpp for example comes in the form of two python files.

As mentioned, there are 5 independent python bindings for llama.cpp. Unifying at least the base python binding would help to focus related python llama.cpp development.

dmahurin avatar May 17 '23 16:05 dmahurin

@abetlen, perhaps you saw that I created pull request #1660 to add low level python bindings from llama-cpp-python.

The PR puts llama_cpp.py and low level examples in the examples/ folder.

There was a bit of filtering and some squashing to get clean history for the low level commits. For now I excluded the multi char change, mainly because it created a dependency on another file, util.py. (and the change looks more complex than I would expect).

Any comments on the approach of the PR?

dmahurin avatar Jun 01 '23 17:06 dmahurin