llama.cpp
llama.cpp copied to clipboard
Fine Tuning
Hey!
Thank you for your amazing job!
I'm curious is it possible to use RLHF feedback after a response to make small incremental adjustments in a tuning process? For example, if the user decides to fine-tune after an incorrect answer, can the model spend 60 seconds in the fine-tuning phase, save a checkpoint to disk, and then move on to the next question?
I believe llama.ccp
is only for inference, not training. Check out chatllama, but you will likely need some high-end GPUs to do RLHF. Alternatively, look at accelerate trl for performing RHLF on models that fit on consumer GPUs.
@gjmulder I would like to continue narrative to run such processes on CPU only. Even if it super slow I think it's possible to spend some time budget (60 secs) to improve a bit weights and close the loop of self improvement like in Gödel machines.
Check out thread https://github.com/ggerganov/llama.cpp/issues/23. This would allow you to have ChatGPT-type narrative conversations with the model, but is not RLHF.