xiangrongzeng

Results 1 issues of xiangrongzeng

In step3 for rlhf finetuning, there is an actor and a critic. The actor and critic may required different tokenizers. For example, the actor is opt-1.3B, while the critic is...

question
deespeed chat