llama.cpp icon indicating copy to clipboard operation
llama.cpp copied to clipboard

Fine Tuning

Open miolini opened this issue 1 year ago • 3 comments

Hey!

Thank you for your amazing job!

I'm curious is it possible to use RLHF feedback after a response to make small incremental adjustments in a tuning process? For example, if the user decides to fine-tune after an incorrect answer, can the model spend 60 seconds in the fine-tuning phase, save a checkpoint to disk, and then move on to the next question?

miolini avatar Mar 12 '23 17:03 miolini

I believe llama.ccp is only for inference, not training. Check out chatllama, but you will likely need some high-end GPUs to do RLHF. Alternatively, look at accelerate trl for performing RHLF on models that fit on consumer GPUs.

gjmulder avatar Mar 12 '23 18:03 gjmulder

@gjmulder I would like to continue narrative to run such processes on CPU only. Even if it super slow I think it's possible to spend some time budget (60 secs) to improve a bit weights and close the loop of self improvement like in Gödel machines.

miolini avatar Mar 12 '23 18:03 miolini

Check out thread https://github.com/ggerganov/llama.cpp/issues/23. This would allow you to have ChatGPT-type narrative conversations with the model, but is not RLHF.

gjmulder avatar Mar 12 '23 18:03 gjmulder