nanoGPT Use argparse in configurator.py

I really liked the simplicity of the globals() approach, this is one small improvement that adds argparse support, which gives a few things for free:

python train.py -h now returns arg + types, along with default values (see paste below)
typechecking happens inside argparse + gives clear error messages

Unfortunately, had to add a hack for the betas parameter since it's a tuple, which isn't natively supported with argparse. If we're fine switching to beta1 and beta2, then we can even remove the literal_eval and clean up the code significantly.

Output examples:

python train.py -h ->

nanoGPT git:(configurator-argparse) ✗ python train.py -h
usage: train.py [-h] [--out_dir str] [--eval_interval int] [--log_interval int] [--eval_iters int] [--eval_only bool] [--always_save_checkpoint bool]
                [--init_from str] [--wandb_log bool] [--wandb_project str] [--wandb_run_name str] [--dataset str] [--batch_size int] [--block_size int]
                [--n_layer int] [--n_head int] [--n_embd int] [--dropout float] [--learning_rate float] [--max_iters int] [--weight_decay float]
                [--betas tuple_arg] [--decay_lr bool] [--warmup_iters int] [--lr_decay_iters int] [--min_lr float] [--backend str] [--device str] [--dtype str]
                [--compile bool]
                [str]

positional arguments:
  str                   Python config file to override defaults

optional arguments:
  -h, --help            show this help message and exit
  --out_dir str         default: out
  --eval_interval int   default: 2000
  --log_interval int    default: 1
  --eval_iters int      default: 200
  --eval_only bool      default: False
  --always_save_checkpoint bool
                        default: True
  --init_from str       default: scratch
  --wandb_log bool      default: False
  --wandb_project str   default: owt
  --wandb_run_name str  default: gpt2
  --dataset str         default: openwebtext
  --batch_size int      default: 12
  --block_size int      default: 1024
  --n_layer int         default: 12
  --n_head int          default: 12
  --n_embd int          default: 768
  --dropout float       default: 0.0
  --learning_rate float
                        default: 0.0006
  --max_iters int       default: 600000
  --weight_decay float  default: 0.01
  --betas tuple_arg     default: (0.9, 0.95), to pass surround in quotes e.g. --betas='(0.9, 0.95)'
  --decay_lr bool       default: True
  --warmup_iters int    default: 2000
  --lr_decay_iters int  default: 600000
  --min_lr float        default: 6e-05
  --backend str         default: nccl
  --device str          default: cuda
  --dtype str           default: bfloat16
  --compile bool        default: True

python sample.py -h ->

(spacy) ➜  nanoGPT git:(configurator-argparse) ✗ python sample.py -h
usage: sample.py [-h] [--out_dir str] [--start str] [--num_samples int] [--max_new_tokens int] [--temperature float] [--top_k int] [--seed int] [--device str]
                 [--dtype str] [--compile bool]
                 [str]

positional arguments:
  str                   Python config file to override defaults

optional arguments:
  -h, --help            show this help message and exit
  --out_dir str         default: out
  --start str           default:
  --num_samples int     default: 10
  --max_new_tokens int  default: 500
  --temperature float   default: 0.8
  --top_k int           default: 200
  --seed int            default: 1337
  --device str          default: cuda
  --dtype str           default: bfloat16
  --compile bool        default: False

other examples of usage:

>>> python train.py config/eval_gpt2.py

Overriding config with config/eval_gpt2.py:
# evaluate the base gpt2
# n_layer=12, n_head=12, n_embd=768
# 124M parameters
batch_size = 8
eval_iters = 500 # use more iterations to get good estimate
eval_only = True
wandb_log = False
init_from = 'gpt2'

keyword usage:

>>> python train.py --wandb_log=False --init_from='gpt-4'
Overriding: init_from = gpt-4
Overriding: wandb_log = True

error message:

>>> python train.py --wandb_log=False --batch_size='hello'
usage: train.py [-h] [--out_dir str] [--eval_interval int] [--log_interval int] [--eval_iters int] [--eval_only bool] [--always_save_checkpoint bool]
                [--init_from str] [--wandb_log bool] [--wandb_project str] [--wandb_run_name str] [--dataset str] [--batch_size int] [--block_size int]
                [--n_layer int] [--n_head int] [--n_embd int] [--dropout float] [--learning_rate float] [--max_iters int] [--weight_decay float]
                [--betas tuple_arg] [--decay_lr bool] [--warmup_iters int] [--lr_decay_iters int] [--min_lr float] [--backend str] [--device str] [--dtype str]
                [--compile bool]
                [str]
train.py: error: argument --batch_size: invalid int value: 'hello'

Jan 08 '23 21:01 plotguy

😂💀

wow, not too bad looking at all, let me play with for a bit...

Jan 08 '23 22:01 karpathy

I'm trying to take this path it's just making things worse and more complicated :( E.g. now I can't do --compile=False because of argparse's opinions about boolean variables. Which I disagree with.

https://stackoverflow.com/questions/15008758/parsing-boolean-values-with-argparse

I don't think there's enough to gain here.

Jan 11 '23 00:01 karpathy

Oof nice find, yeah the Boolean args feel like a deal breaker. I’ll close this pr

Jan 11 '23 06:01 plotguy

nanoGPT nanoGPT copied to clipboard

Use argparse in configurator.py

nanoGPT
nanoGPT copied to clipboard