Math_Word_Problem_Collection
Math_Word_Problem_Collection copied to clipboard
Why was your Verifier trained at same time ?
as the article : https://sieunpark77.medium.com/a-late-review-of-openais-training-verifiers-to-solve-math-word-problems-0d457eb706e3
For each training problem, we sample 100 completions from the generator and label each solution as correct or incorrect
as the words , I think Verifier and Generator may be optimized with same model ,but trained at different time .
in the loss of code,lm_loss + classifier_loss
will calculate at the same time .
How does the Verifier trained on 100 samples from the generator
and how to label the samples?