gpt-2-output-dataset icon indicating copy to clipboard operation
gpt-2-output-dataset copied to clipboard

Questions regarding the dataset format and partition

Open TingchenFu opened this issue 1 year ago • 0 comments

Hi, Thanks for making the great dataset public. I have download the webtext.train.jsonl, a file in 250k line. I am not sure about whether it is just a sample or a slice of the WebText training set on which the GPT-2 models are trained? May I have access to the full training set of the WebText?

Looking forward to your reply or any advice.

TingchenFu avatar Oct 15 '22 04:10 TingchenFu