WikiText 103 evaluation

Open karpathy opened this issue 1 year ago • 0 comments

I've seen some repos use WikiText-103 as the dataset they use to eval GPT-like models, e.g.:

https://github.com/tysam-code/hlb-gpt/tree/main

Add prepro script to download and preprocess and tokenize WikiText-103 just like tiny shakespeare / tiny stories, following this repo. Adapt the mainline training script train_gpt2.cu to report the validation performance on this set.

Add python code that does the same, evaluates on WikiText-103, and reports performance for all the GPT-2 model sizes. This is our baseline to reach, training from scratch init.

Optionally help research other ways that people have evaluated GPT-2 models, or attempted to reproduce them in the past.

Apr 24 '24 18:04 karpathy