Leandro von Werra

Results 160 comments of Leandro von Werra

Hi @ayulockin! That sounds great! I think a good first step would be to work on an example (e.g. in `examples`+`docs`) of a parameter sweep with W&B. Based on feedback...

Closing this for now - feel free to reopen if there's an update!

WE have experienced this a few times when the generation is very short (only 1-2 tokens). One way to force the model to always generate tokens is to set the...

Haven't had time to investigate this, yet, but it's tracked in #101. Yes, the `min_length` can lead to negative KL. The difference between this and the `eos_token_id=-1` is that with...

Closing for now, feel free to re-open if there's an update.

Interesting idea! Indeed, TRL is not really setup for encoder models at this point, rather decoder models. In your setup each move would correspond to a forward pass in your...

I haven't thought it through completely but I think the main change necessary is to batch the connected forward passes together. So maybe overwriting the `batched_forward_pass` method would already be...

Closing this for now - feel free to reopen if there's an update :)

So when we do the forward pass we actually predict one more token than we generated. E.g. when inputing 3 tokens the model will also predict a 4th token which...