gpt4all Expose llama.cpp's progress

Expose llama.cpp's progress_callback to bindings

Open cebtenzzre opened this issue 1 year ago • 2 comments

We could expose llama.cpp's progress_callback to provide a way to both report progress and cancel model loading via the bindings.

ref #1934

Feb 06 '24 19:02 cebtenzzre

https://discord.com/channels/1076964370942267462/1100510109106450493/1214449811374342144 may be possible already?

Mar 05 '24 21:03 jacoobes

may be possible already?

There is an important difference between canceling model loading (copying tensors from disk to RAM/VRAM, needs the progress callback), canceling prompt processing (because we don't split our input to llama_decode into batches, the simplest way forward is the ggml graph abort callback), and canceling token generation (which is simple and already implemented in the backend because we generate one token at a time).

Mar 06 '24 17:03 cebtenzzre

gpt4all gpt4all copied to clipboard

Expose llama.cpp's progress_callback to bindings

gpt4all
gpt4all copied to clipboard