kranky

Results 4 comments of kranky

Hi, I am using around 42GB GPU. dataset = open web text (20GB). batch size = 16 i am training it for 8th day now. ![Screenshot_2023-08-08-13-06-40-82_40deb401b9ffe8e1df2f1cc5ba480b12](https://github.com/karpathy/nanoGPT/assets/36001854/bb8d9b87-d677-441b-9965-71269c9b56a7) val Loss at 2.95

@ziqi-zhang batch size is 20. wandb_run_name='gpt2-124M' batch_size = 20 block_size = 1024 gradient_accumulation_steps = 5 * 8 max_iters = 600000 lr_decay_iters = 600000 eval_interval = 1000 eval_iters = 200 log_interval...

i am using two GPU Quadro RTX 6000 each 24576MiB

converged configurations : batch size= 20 ,gradient steps = 40 original config batch size = 12 , gradient steps = 40 earlier configurations which failed to converge. i) batch size...