gpt-2-output-dataset icon indicating copy to clipboard operation
gpt-2-output-dataset copied to clipboard

What is the full link for gs://gpt-2/output-dataset/v1

Open guotong1988 opened this issue 2 years ago • 5 comments

Thank you very much!

guotong1988 avatar Feb 08 '23 01:02 guotong1988

Hi @guotong1988, you can find the whole list of data using the following code which you will get in download_dataset.py:

for ds in [
    'webtext',
    'small-117M',  'small-117M-k40',
    'medium-345M', 'medium-345M-k40',
    'large-762M',  'large-762M-k40',
    'xl-1542M',    'xl-1542M-k40',
]:
    for split in ['train', 'valid', 'test']:
        filename = ds + "." + split + '.jsonl'
        r = requests.get("https://openaipublic.azureedge.net/gpt-2/output-dataset/v1/" + filename, stream=True)

From above you can download any specific file as follows: https://openaipublic.azureedge.net/gpt-2/output-dataset/v1/small-117M.train.jsonl

Else, you can run download_dataset.py to download all the dataset files.

I hope this helps.

allosharma avatar Feb 23 '23 12:02 allosharma

thank you

guotong1988 avatar Feb 23 '23 13:02 guotong1988

Thanks

HarmonyMurombo avatar Apr 22 '23 20:04 HarmonyMurombo

Thanks

PeterYang03110 avatar Jan 23 '24 12:01 PeterYang03110

Hi, How are you? I have some question. How to contact with you? Thanks.

PeterYang03110 avatar Jan 23 '24 12:01 PeterYang03110