Leandro von Werra comments

Results 160 comments of


                                            Leandro von Werra

Why not create an example of using PPO to train a summarization model?

Lack of time and people - feel free to try it!

Experimentation Tooling and Parameter Optimization

Hi @ayulockin! That sounds great! I think a good first step would be to work on an example (e.g. in `examples`+`docs`) of a parameter sweep with W&B. Based on feedback...

Experimentation Tooling and Parameter Optimization

Closing this for now - feel free to reopen if there's an update!

Loss suddenly increase extremely high in only one step in sentiment notebook

WE have experienced this a few times when the generation is very short (only 1-2 tokens). One way to force the model to always generate tokens is to set the...

Loss suddenly increase extremely high in only one step in sentiment notebook

Haven't had time to investigate this, yet, but it's tracked in #101. Yes, the `min_length` can lead to negative KL. The difference between this and the `eos_token_id=-1` is that with...

Loss suddenly increase extremely high in only one step in sentiment notebook

Closing for now, feel free to re-open if there's an update.

Using TRL to fine-tune Bert Classification Model?

Interesting idea! Indeed, TRL is not really setup for encoder models at this point, rather decoder models. In your setup each move would correspond to a forward pass in your...

Using TRL to fine-tune Bert Classification Model?

I haven't thought it through completely but I think the main change necessary is to batch the connected forward passes together. So maybe overwriting the `batched_forward_pass` method would already be...

Using TRL to fine-tune Bert Classification Model?

Closing this for now - feel free to reopen if there's an update :)

minibatching changes and masking

So when we do the forward pass we actually predict one more token than we generated. E.g. when inputing 3 tokens the model will also predict a 4th token which...