Pattaro

Results 5 comments of Pattaro

https://github.com/liziniu/ReMax

> Using the same tokenizer for actor and critic in step3 is beneficial. Considering that RM model is easier to train, in step2, I try to use the actor tokenizer...

确实很心动,期待作者集成~

我也遇到了 你解决了吗