nanoGPT
nanoGPT copied to clipboard
Use argparse in configurator.py
I really liked the simplicity of the globals() approach, this is one small improvement that adds argparse support, which gives a few things for free:
python train.py -hnow returns arg + types, along with default values (see paste below)- typechecking happens inside argparse + gives clear error messages
Unfortunately, had to add a hack for the betas parameter since it's a tuple, which isn't natively supported with argparse. If we're fine switching to beta1 and beta2, then we can even remove the literal_eval and clean up the code significantly.
Output examples:
python train.py -h ->
nanoGPT git:(configurator-argparse) ✗ python train.py -h
usage: train.py [-h] [--out_dir str] [--eval_interval int] [--log_interval int] [--eval_iters int] [--eval_only bool] [--always_save_checkpoint bool]
[--init_from str] [--wandb_log bool] [--wandb_project str] [--wandb_run_name str] [--dataset str] [--batch_size int] [--block_size int]
[--n_layer int] [--n_head int] [--n_embd int] [--dropout float] [--learning_rate float] [--max_iters int] [--weight_decay float]
[--betas tuple_arg] [--decay_lr bool] [--warmup_iters int] [--lr_decay_iters int] [--min_lr float] [--backend str] [--device str] [--dtype str]
[--compile bool]
[str]
positional arguments:
str Python config file to override defaults
optional arguments:
-h, --help show this help message and exit
--out_dir str default: out
--eval_interval int default: 2000
--log_interval int default: 1
--eval_iters int default: 200
--eval_only bool default: False
--always_save_checkpoint bool
default: True
--init_from str default: scratch
--wandb_log bool default: False
--wandb_project str default: owt
--wandb_run_name str default: gpt2
--dataset str default: openwebtext
--batch_size int default: 12
--block_size int default: 1024
--n_layer int default: 12
--n_head int default: 12
--n_embd int default: 768
--dropout float default: 0.0
--learning_rate float
default: 0.0006
--max_iters int default: 600000
--weight_decay float default: 0.01
--betas tuple_arg default: (0.9, 0.95), to pass surround in quotes e.g. --betas='(0.9, 0.95)'
--decay_lr bool default: True
--warmup_iters int default: 2000
--lr_decay_iters int default: 600000
--min_lr float default: 6e-05
--backend str default: nccl
--device str default: cuda
--dtype str default: bfloat16
--compile bool default: True
python sample.py -h ->
(spacy) ➜ nanoGPT git:(configurator-argparse) ✗ python sample.py -h
usage: sample.py [-h] [--out_dir str] [--start str] [--num_samples int] [--max_new_tokens int] [--temperature float] [--top_k int] [--seed int] [--device str]
[--dtype str] [--compile bool]
[str]
positional arguments:
str Python config file to override defaults
optional arguments:
-h, --help show this help message and exit
--out_dir str default: out
--start str default:
--num_samples int default: 10
--max_new_tokens int default: 500
--temperature float default: 0.8
--top_k int default: 200
--seed int default: 1337
--device str default: cuda
--dtype str default: bfloat16
--compile bool default: False
other examples of usage:
>>> python train.py config/eval_gpt2.py
Overriding config with config/eval_gpt2.py:
# evaluate the base gpt2
# n_layer=12, n_head=12, n_embd=768
# 124M parameters
batch_size = 8
eval_iters = 500 # use more iterations to get good estimate
eval_only = True
wandb_log = False
init_from = 'gpt2'
keyword usage:
>>> python train.py --wandb_log=False --init_from='gpt-4'
Overriding: init_from = gpt-4
Overriding: wandb_log = True
error message:
>>> python train.py --wandb_log=False --batch_size='hello'
usage: train.py [-h] [--out_dir str] [--eval_interval int] [--log_interval int] [--eval_iters int] [--eval_only bool] [--always_save_checkpoint bool]
[--init_from str] [--wandb_log bool] [--wandb_project str] [--wandb_run_name str] [--dataset str] [--batch_size int] [--block_size int]
[--n_layer int] [--n_head int] [--n_embd int] [--dropout float] [--learning_rate float] [--max_iters int] [--weight_decay float]
[--betas tuple_arg] [--decay_lr bool] [--warmup_iters int] [--lr_decay_iters int] [--min_lr float] [--backend str] [--device str] [--dtype str]
[--compile bool]
[str]
train.py: error: argument --batch_size: invalid int value: 'hello'
😂💀
wow, not too bad looking at all, let me play with for a bit...
I'm trying to take this path it's just making things worse and more complicated :( E.g. now I can't do --compile=False because of argparse's opinions about boolean variables. Which I disagree with.
https://stackoverflow.com/questions/15008758/parsing-boolean-values-with-argparse
I don't think there's enough to gain here.
Oof nice find, yeah the Boolean args feel like a deal breaker. I’ll close this pr