kranky
kranky
Hi, I am using around 42GB GPU. dataset = open web text (20GB). batch size = 16 i am training it for 8th day now.  val Loss at 2.95
@ziqi-zhang batch size is 20. wandb_run_name='gpt2-124M' batch_size = 20 block_size = 1024 gradient_accumulation_steps = 5 * 8 max_iters = 600000 lr_decay_iters = 600000 eval_interval = 1000 eval_iters = 200 log_interval...
i am using two GPU Quadro RTX 6000 each 24576MiB
converged configurations : batch size= 20 ,gradient steps = 40 original config batch size = 12 , gradient steps = 40 earlier configurations which failed to converge. i) batch size...