Leandro von Werra

Results 155 comments of Leandro von Werra

The KL divergence is used as a penalty per token whereas the score is only given to the sequence as a whole thus it is received at the last generated...

Could this be related to this? https://github.com/lvwerra/trl/issues/183#issuecomment-1451250635 Without code and the full error message it's a bit hard to know what's going on.

Unfortunately the GPT-4 model was not released so we can't fine-tune it ourselves. Closing the issue for now as it seems solved :)

OpenAI will only give access to an API to use the model, not the actual weights and code to fine-tune it yourself. So no, it won't be possible to fine-tune...

We could probably make use of the `accelerate`[ context manager for gradient accumulation](https://huggingface.co/docs/accelerate/usage_guides/gradient_accumulation#letting-accelerate-handle-gradient-accumulation)!

Looks good as a temporary fix but we should really change the API a bit to make this easier. :)

Exactly, otherwise our API becomes more and more dark magic :D I think for NPP, PEFT, Int8 it should all become: ```python model = AutoModelForCausalLMWithValueHead.from_pretrained(ckpt, method_specific_kwargs) ``` Internally we can...

Looking good :)

Yes, good point, maybe you can add a disclaimer to the README @ArmelRandy

@Eddisont12 can you confirm that this solves the issue?