Math_Word_Problem_Collection icon indicating copy to clipboard operation
Math_Word_Problem_Collection copied to clipboard

Why was your Verifier trained at same time ?

Open DavideHe opened this issue 8 months ago • 4 comments

as the article : https://sieunpark77.medium.com/a-late-review-of-openais-training-verifiers-to-solve-math-word-problems-0d457eb706e3 For each training problem, we sample 100 completions from the generator and label each solution as correct or incorrect as the words , I think Verifier and Generator may be optimized with same model ,but trained at different time . in the loss of code,lm_loss + classifier_loss will calculate at the same time . How does the Verifier trained on 100 samples from the generator and how to label the samples?

DavideHe avatar Jun 19 '24 12:06 DavideHe