pagepal666

Results 4 comments of pagepal666

Include response length in the reward calculation

"keep the critic at 7B even if your actor is 70B." Will this decrease performance?

> Ah, no, to be clear, what I mean is the following: Right now, the padding is done like this ('promp' - a prompt token, 'respo' - a response token):...

> @hijkzzz Could I ask a quick related question: In `actor.process_sequences()` I also see that `attention_mask` is set to False on all EOS tokens, except the final EOS token in...