nanoGPT
nanoGPT copied to clipboard
Question about vocab size
First, thank you for creating nanoGPT. It has been an amazing learning experience! I have a question about vocab size and training. I have built nanoGPT and ran the Shakespeare data with a vocab size of 12 and everything works great. I get good training and good results. I am now experimenting with a data set that has a vocab size that is ~100 (non-trivial density of special characters) and the training is much worse by almost 50%. Any ideas on what is going in and how I could improve the training? Here are my current parameters:
gradient_accumulation_steps = 1
batch_size = 32
block_size = 192
n_layer = 4
n_head = 4
n_embd = 192
dropout = 0.5
learning_rate = 1e-3
max_iters = 1000
lr_decay_iters = 1000
min_lr = 1e-4
beta2 = 0.99
warmup_iters = 100
I have a GTX1080 with 8GB VRAM. Thanks!