MXuer
Results
2
issues of
MXuer
- I can't find the part for "per-token KL penalty from the SFT model" during the PPO training in the file `model/model_training/trainer_rl.py`, maybe I missed something. Could you tell me...
ml
question
I am confused about this sentence in your papar of "GPT Understands, Too": **Moreover, in the inference, we only need the output embedding h and can discard the LSTM head.**...