llama.cpp
llama.cpp copied to clipboard
Create a C-style API similar to whisper.cpp
This change makes it easier to use this code as a library - say to build python bindings on top of it. It extracts out the following functions into llama.cpp
-
llama_model_load
-
llama_eval
-
llama_model_quantize
It also moves the relevant struct definitions to llama.h
. This for example, helps avoid redefinition of llama_hparams
in quantize.cpp
. Please let me know if you have any suggestions to improve this.
See here for an example of this library structure in use.
In my fork I added this struct to bundle up all the relevant data:
struct llama_state {
gpt_vocab vocab;
llama_model model;
struct {
int64_t t_load_us = -1;
int64_t t_sample_us = -1;
int64_t t_predict_us = -1;
} timing;
};
@ggerganov I have made the changes. Please let me know what you think
@j-f1 @Green-Sky @ggerganov I have done another pass at refactoring and also fixed a few logical bugs that left interactive mode broken in my original version (among other things). I have verified that interactive mode now works as intended and inference remains just as fast as before.
I have also rebased on to the latest master branch. Please take another look. Thanks!
@thomasantony
We want to have a C-style API in llama.h
. We cannot expose C++ constructs
For now, leave it like this and let me apply the necessary changes on top of yours to demonstrate what I have in mind - probably tomorrow or the day after. Thanks for the contributing!
@thomasantony We want to have a C-style API in
llama.h
. We cannot expose C++ constructsFor now, leave it like this and let me apply the necessary changes on top of yours to demonstrate what I have in mind - probably tomorrow or the day after. Thanks for the contributing!
Okay. Thanks. In the meantime, I will rebase the new changes on the master branch on to this branch.
Superseded by #370