pagepal666
pagepal666
Include response length in the reward calculation
"keep the critic at 7B even if your actor is 70B." Will this decrease performance?
> Ah, no, to be clear, what I mean is the following: Right now, the padding is done like this ('promp' - a prompt token, 'respo' - a response token):...
> @hijkzzz Could I ask a quick related question: In `actor.process_sequences()` I also see that `attention_mask` is set to False on all EOS tokens, except the final EOS token in...