Leandro von Werra
Leandro von Werra
The KL divergence is used as a penalty per token whereas the score is only given to the sequence as a whole thus it is received at the last generated...
Could this be related to this? https://github.com/lvwerra/trl/issues/183#issuecomment-1451250635 Without code and the full error message it's a bit hard to know what's going on.
Unfortunately the GPT-4 model was not released so we can't fine-tune it ourselves. Closing the issue for now as it seems solved :)
OpenAI will only give access to an API to use the model, not the actual weights and code to fine-tune it yourself. So no, it won't be possible to fine-tune...
We could probably make use of the `accelerate`[ context manager for gradient accumulation](https://huggingface.co/docs/accelerate/usage_guides/gradient_accumulation#letting-accelerate-handle-gradient-accumulation)!
Looks good as a temporary fix but we should really change the API a bit to make this easier. :)
Exactly, otherwise our API becomes more and more dark magic :D I think for NPP, PEFT, Int8 it should all become: ```python model = AutoModelForCausalLMWithValueHead.from_pretrained(ckpt, method_specific_kwargs) ``` Internally we can...
Looking good :)
Yes, good point, maybe you can add a disclaimer to the README @ArmelRandy
@Eddisont12 can you confirm that this solves the issue?