banyan-god

Results 16 comments of banyan-god

Is it though ? that was the value on train.py, Either way tried few runs but no luck ![W B Chart 4_3_2024, 1 33 41 PM](https://github.com/karpathy/nanoGPT/assets/153394752/a6259a5a-a097-460f-ac21-8ace0c925b0f) ![W B Chart 4_3_2024,...

> @banyan-god, did you try to match the total batch size of ~0.5M? batch_size * num_of_gpus * gradaccum > 500. > > Your current total batch size is 40% of...

@yalding so started another job today with ~572.06M Parameters with grad accumulation of 40 as you suggested. Will report back on progress if it explodes ``` always_save_checkpoint:true backend:"nccl" batch_size:5 beta1:0.9...

@yalding ok rolled back all the changes to hyper parameter and just running them with following `torchrun --standalone --nproc_per_node=2 train.py config/train_gpt2.py` ``` always_save_checkpoint:true backend:"nccl" batch_size:12 beta1:0.9 beta2:0.95 bias:false block_size:1,024 compile:true...

@yalding unfortunately that didnt work either

@yalding ``` always_save_checkpoint:true backend:"nccl" batch_size:12 beta1:0.9 beta2:0.95 bias:false block_size:1,024 compile:true dataset:"openwebtext" decay_lr:true device:"cuda" dropout:0 dtype:"bfloat16" eval_interval:1,000 eval_iters:200 eval_only:false grad_clip:1 gradient_accumulation_steps:40 init_from:"scratch" learning_rate:0.0006 log_interval:10 lr_decay_iters:600,000 max_iters:600,000 min_lr:0.00006 n_embd:768 n_head:12 n_layer:12 out_dir:"out"...

I am also wondering possibly something to do with pytorch version or openweb text

@seanxwzhang I want to say it is combination of tokenizer and dataset. When i switched over to gpt4 tokenizer problem disappeared.

[514e8a53-74e0-4d77-a61e-53a416f3ec3a.txt](https://github.com/user-attachments/files/17966626/514e8a53-74e0-4d77-a61e-53a416f3ec3a.txt) I was able to reproduce it on 4x 4090 ~31 minutes step:1750/1750 val_loss:3.2783 train_time:1889410ms step_avg:1085.87ms

> A note: The current cost per run on an 8xH100 is about $1.90 (since it's about $3/hr for SXM H100s) > > Personally, when I don't feel like spending...