llm.c
llm.c copied to clipboard
is max_seq_len configurable or hardcoded parameter?
Today I was going to train a gpt3_124m model, when I noticed that the max_seq_len is hardcoded here and at the same time, it's a configurable parameter here. Then I executed the training executable with and without -t 2048
and find that the number of model parameters stays the same at 124475904.
is there a 🐛 bounty? 😃