gpt4all icon indicating copy to clipboard operation
gpt4all copied to clipboard

Expose llama.cpp's progress_callback to bindings

Open cebtenzzre opened this issue 1 year ago • 2 comments

We could expose llama.cpp's progress_callback to provide a way to both report progress and cancel model loading via the bindings.

ref #1934

cebtenzzre avatar Feb 06 '24 19:02 cebtenzzre

https://discord.com/channels/1076964370942267462/1100510109106450493/1214449811374342144 may be possible already?

jacoobes avatar Mar 05 '24 21:03 jacoobes

may be possible already?

There is an important difference between canceling model loading (copying tensors from disk to RAM/VRAM, needs the progress callback), canceling prompt processing (because we don't split our input to llama_decode into batches, the simplest way forward is the ggml graph abort callback), and canceling token generation (which is simple and already implemented in the backend because we generate one token at a time).

cebtenzzre avatar Mar 06 '24 17:03 cebtenzzre