trl icon indicating copy to clipboard operation
trl copied to clipboard

trl with seq2seq

Open jbdel opened this issue 3 years ago • 1 comments

Hello, Thanks for releasing this code.

I would like to use this algorithm with a trained seq2seq (x -> y) model. I would initialize the active model and ref model with the trained seq2seq. Then I would proceed like this:

roll-out: x -> active model -> outputs y evaluation: get reward for y optimization: x -> active model -> force y as input, get decoder logprobs x -> ref model -> force y as input, get decoder logprobs then compute kl + reward etc.

Does that make sense to proceed as such ?

Thank you for your feedback

jbdel avatar Oct 20 '21 16:10 jbdel

I think that makes sense. I have not used a seq2seq model, yet. So you might want to start with a decoder only model that should work and then compare the results to your enc-dec approach. Good luck!

lvwerra avatar Dec 23 '21 09:12 lvwerra