MXuer

Results 2 issues of MXuer

- I can't find the part for "per-token KL penalty from the SFT model" during the PPO training in the file `model/model_training/trainer_rl.py`, maybe I missed something. Could you tell me...

ml
question

I am confused about this sentence in your papar of "GPT Understands, Too": **Moreover, in the inference, we only need the output embedding h and can discard the LSTM head.**...