CodeRL
CodeRL copied to clipboard
How to generate Critic Scores that can mimic a reward model
Hello, hope all is well,
Wanted to ask how to generate critic scores for a solution of a code problem, is there a way instead of just classifying them using the critic model?
Hello, I don't know if you solved this problem, I'm also experiencing this problem now, can you give me some advice
When I used the critic model to score the generated code, I found that the effect was very poor, I don't know if I made a mistake, I wonder if you have ever encountered this situation