JingerAI comments

Results 5 comments of


                                            JingerAI

step3_rlhf_finetuning may needs two tokenizers ?

@guijuzhejiang The author's suggestion is that both models belong to the same model family, as confirmed in the paper "instructgpt."

step3_rlhf_finetuning may needs two tokenizers ?

@xiangrongzeng Will the project support GLM？ It is a benefit for the Chinese.

The step2 scoring looks correct but the step3 model is talking gibberish

Otherwise, is it necessary to load weights from SFT in step 2? In this project, the RW model only starts from the pre-trained model, which is different from instructGPT.

The step2 scoring looks correct but the step3 model is talking gibberish

@yaozhewei You are right, many thanks. Additionally, in this project, it appears that the KL penalty is added starting from the last token. Does this approach make sense in light...

The step2 scoring looks correct but the step3 model is talking gibberish

> @JingerAI please create another issue so we can separate different problems. @panganqi and others, please try the latest branch to see if the accuracy problem is resolved. Also, please...