xiangrongzeng
Results
1
issues of
xiangrongzeng
In step3 for rlhf finetuning, there is an actor and a critic. The actor and critic may required different tokenizers. For example, the actor is opt-1.3B, while the critic is...
question
deespeed chat