opencompass
opencompass copied to clipboard
[Feature] Add Math Dataset as in "Let’s Verify Step by Step"
Describe the feature
Some community models have adopted the methods from the "Let’s Verify Step by Step" paper for training, thus there is a risk of Math data leakage.
https://arxiv.org/pdf/2305.20050.pdf
We may consider using a testing method with only 500 samples for reference.
On the other hand, our current evaluation approach cannot measure the extent of data leakage at all. Whether there is a need to treat Math separately is also worth discussing.
Will you implement it?
- [ ] I would like to implement this feature and create a PR!
Ping myself
update?