llama.cpp Create a C-style API similar to whisper.cpp

This change makes it easier to use this code as a library - say to build python bindings on top of it. It extracts out the following functions into llama.cpp

llama_model_load
llama_eval
llama_model_quantize

It also moves the relevant struct definitions to llama.h. This for example, helps avoid redefinition of llama_hparams in quantize.cpp. Please let me know if you have any suggestions to improve this.

See here for an example of this library structure in use.

Mar 13 '23 03:03 thomasantony

In my fork I added this struct to bundle up all the relevant data:

struct llama_state {
    gpt_vocab vocab;
    llama_model model;
    struct {
        int64_t t_load_us = -1;
        int64_t t_sample_us = -1;
        int64_t t_predict_us = -1;
    } timing;
};

Mar 13 '23 16:03 j-f1

@ggerganov I have made the changes. Please let me know what you think

Mar 16 '23 03:03 thomasantony

@j-f1 @Green-Sky @ggerganov I have done another pass at refactoring and also fixed a few logical bugs that left interactive mode broken in my original version (among other things). I have verified that interactive mode now works as intended and inference remains just as fast as before.

I have also rebased on to the latest master branch. Please take another look. Thanks!

Mar 18 '23 02:03 thomasantony

@thomasantony We want to have a C-style API in llama.h. We cannot expose C++ constructs

For now, leave it like this and let me apply the necessary changes on top of yours to demonstrate what I have in mind - probably tomorrow or the day after. Thanks for the contributing!

Mar 18 '23 17:03 ggerganov

@thomasantony We want to have a C-style API in llama.h. We cannot expose C++ constructs

For now, leave it like this and let me apply the necessary changes on top of yours to demonstrate what I have in mind - probably tomorrow or the day after. Thanks for the contributing!

Okay. Thanks. In the meantime, I will rebase the new changes on the master branch on to this branch.

Mar 18 '23 17:03 thomasantony

Superseded by #370

Mar 21 '23 20:03 ggerganov

llama.cpp llama.cpp copied to clipboard

Create a C-style API similar to whisper.cpp

llama.cpp
llama.cpp copied to clipboard