morphpiece
morphpiece
Hi So I was training a new tokenizer from Llama Tokenizer (meta-llama/Llama-2-7b-hf), on a medium sized corpus (Fineweb-10BT sample : 15 million documents with average length of 2300 characters). After...
Today I was going to train a gpt3_124m model, when I noticed that the max_seq_len is hardcoded [here](https://github.com/karpathy/llm.c/blob/d396cd18b71367f79cbaab8f8203e64e578f9ee8/train_gpt2.cu#L653) and at the same time, it's a configurable parameter [here](https://github.com/karpathy/llm.c/blob/d396cd18b71367f79cbaab8f8203e64e578f9ee8/train_gpt2.cu#L1513). Then I...
This refers to reading checkpoints in HF format ([issue](https://github.com/karpathy/llm.c/issues/502)]. In the spirit of readability, I have tried to keep the code as close as possible to [train_gpt2.py](https://github.com/karpathy/llm.c/blob/master/train_gpt2.py). I have also...
The instructions given in the README.md of benchmark evaluation need a couple of additions to work. 1. deepspeed needs to be installed. 2. Remove the spaces in the MODEL variable...