Zhuosheng Zhang
Zhuosheng Zhang
Hi, it may need 8/24 hours to train a base/large model using an A100 GPU, respectively. This may also depend on the exact GPU. As it has been a long...
Hi guys, thanks for your interest. The released models are my reproduced ones using a limited computation resource after my internship finishes. It is possible to obtain better results with...
Hi, there would not be a data leak problem because no gold label is used. We only collect questions for automatic rationale generation.
确实如此。这个typo很诡异。我们会在近期在paper中修正。谢谢!