Leandro von Werra comments

Results 155 comments of


                                            Leandro von Werra

why compute_rewards use reward[last_non_masked_index] += score

The KL divergence is used as a penalty per token whereas the score is only given to the sequence as a whole thus it is received at the last generated...

TypeError: new(): invalid data type 'numpy.str_'

Could this be related to this? https://github.com/lvwerra/trl/issues/183#issuecomment-1451250635 Without code and the full error message it's a bit hard to know what's going on.

TypeError: new(): invalid data type 'numpy.str_'

Unfortunately the GPT-4 model was not released so we can't fine-tune it ourselves. Closing the issue for now as it seems solved :)

TypeError: new(): invalid data type 'numpy.str_'

OpenAI will only give access to an API to use the model, not the actual weights and code to fine-tune it yourself. So no, it won't be possible to fine-tune...

Add gradient accumulation

We could probably make use of the `accelerate`[ context manager for gradient accumulation](https://huggingface.co/docs/accelerate/usage_guides/gradient_accumulation#letting-accelerate-handle-gradient-accumulation)!

[`peft`] Fix DP issues

Looks good as a temporary fix but we should really change the API a bit to make this easier. :)

[`peft`] Fix DP issues

Exactly, otherwise our API becomes more and more dark magic :D I think for NPP, PEFT, Int8 it should all become: ```python model = AutoModelForCausalLMWithValueHead.from_pretrained(ckpt, method_specific_kwargs) ``` Internally we can...

[`peft`] Fix DP issues

Looking good :)

error

Yes, good point, maybe you can add a disclaimer to the README @ArmelRandy

error

@Eddisont12 can you confirm that this solves the issue?