llm.c icon indicating copy to clipboard operation
llm.c copied to clipboard

inf loss at big batch

Open karpathy opened this issue 1 year ago • 1 comments

just creating a todo. large batch sizes work now having fixed the size_t bug:

./train_gpt2cu -b 36 -v 200 -s 200 -i data/TinyStories

works, but 48 should fit but doesn't work

./train_gpt2cu -b 48 -v 200 -s 200 -i data/TinyStories

val loss is -nan and train loss stays at inf.

todo track down why and how to prevent

karpathy avatar Apr 26 '24 22:04 karpathy

@karpathy just wanted to check, we've fixed this, right?

ngc92 avatar Jun 07 '24 10:06 ngc92