LLaMA-Pro icon indicating copy to clipboard operation
LLaMA-Pro copied to clipboard

Arxiv Data

Open ZhengTang1120 opened this issue 1 year ago • 2 comments
trafficstars

Hi,

You mentioned that the model is trained on scientific papers(29B arxiv data) as a part of math component. I am wondering if you included the full articles or just math contents?

Thank you, Zheng Tang

ZhengTang1120 avatar Jan 11 '24 21:01 ZhengTang1120

+1

billxbf avatar Jan 11 '24 21:01 billxbf

I use the arxiv dataset as a subset of the proof-pile-2 dataset (https://huggingface.co/datasets/EleutherAI/proof-pile-2)

hills-code avatar Jan 15 '24 11:01 hills-code