LLaMA-Pro Arxiv Data

Arxiv Data

Open ZhengTang1120 opened this issue 1 year ago • 2 comments

trafficstars

Hi,

You mentioned that the model is trained on scientific papers(29B arxiv data) as a part of math component. I am wondering if you included the full articles or just math contents?

Thank you, Zheng Tang

Jan 11 '24 21:01 ZhengTang1120

Jan 11 '24 21:01 billxbf

I use the arxiv dataset as a subset of the proof-pile-2 dataset (https://huggingface.co/datasets/EleutherAI/proof-pile-2)

Jan 15 '24 11:01 hills-code

LLaMA-Pro LLaMA-Pro copied to clipboard

Arxiv Data

LLaMA-Pro
LLaMA-Pro copied to clipboard