nanoGPT
nanoGPT copied to clipboard
The simplest, fastest repository for training/finetuning medium-sized GPTs.
DDP wraps the real module behind model.module, so we need to account for that when calling custom model methods.
vocab_size not found in data/openwebtext/meta.pkl, using GPT-2 default of 50257 Initializing a new model from scratch number of parameters: 124.34M compiling the model... (takes a ~minute) To use data.metrics please...
![image](https://user-images.githubusercontent.com/5590961/216866285-72afbfa0-c4a0-4f91-89b4-bafec3361f96.png) I am using ubuntu on window with rtx 3080. python3 train.py config/train_shakespeare_char.py --compile=False --batch_size=128 even if I increase the batch size a bit, the utilization of the GPU computation...
Can I volunteer to add tests folder to root and some unit tests for the train script? Any preference for pytest or unit test or something else? I noticed the...
Not exactly sure why, presumably the compiler is doing some dark magic, but I get *slightly* better performance when using nn.GELU() instead of `new_gelu`.
Like the title said, I've been receiving "loss: nan" after training for about 2000 iterations. The only things I changed in the script was the block size (from 1024 to...
I would like to use my own data to train an AI for playing the Codenames. Is here anybody interested enough who could give me a little guidance, please? **What...
I used 4 GPUs on 1 node: `torchrun --standalone --proc_per_node=4 train.py --compile=False` But, the training speed is just like 1 GPU,why?
I tried to train a baby GPT quickly with the settings provided in the config with PyTorch 2.0 on RTX 2080 Ti python train.py config/train_shakespeare_char.py --compile=True I got Segmentation fault...