Pattaro
Results
5
comments of
Pattaro
https://github.com/liziniu/ReMax
> Using the same tokenizer for actor and critic in step3 is beneficial. Considering that RM model is easier to train, in step2, I try to use the actor tokenizer...
确实很心动,期待作者集成~
How to solve this problem
我也遇到了 你解决了吗