self-rag
self-rag copied to clipboard
The critic model will generate different type of token when I use run_reward_vllm.py to generate tokens
I want to create my own training data, and I follow the step of creating generator training data. But when I tried to use the critic model to generate the utility (isUse) token, some preds are wrong as shown in the picture.
some of the preds are "retrieval" rather than "utility". I use the exactly same command as in readme file.
Have you figure out this question? I met the same question