The critic model will generate different type of token when I use run_reward_vllm.py to generate tokens

Open Teng0828 opened this issue 1 year ago • 1 comments

I want to create my own training data, and I follow the step of creating generator training data. But when I tried to use the critic model to generate the utility (isUse) token, some preds are wrong as shown in the picture. some of the preds are "retrieval" rather than "utility". I use the exactly same command as in readme file.

May 07 '24 19:05 Teng0828

Have you figure out this question? I met the same question

Jun 18 '24 12:06 fate-ubw