nanoGPT icon indicating copy to clipboard operation
nanoGPT copied to clipboard

The simplest, fastest repository for training/finetuning medium-sized GPTs.

Results 297 nanoGPT issues
Sort by recently updated
recently updated
newest added

DDP wraps the real module behind model.module, so we need to account for that when calling custom model methods.

vocab_size not found in data/openwebtext/meta.pkl, using GPT-2 default of 50257 Initializing a new model from scratch number of parameters: 124.34M compiling the model... (takes a ~minute) To use data.metrics please...

![image](https://user-images.githubusercontent.com/5590961/216866285-72afbfa0-c4a0-4f91-89b4-bafec3361f96.png) I am using ubuntu on window with rtx 3080. python3 train.py config/train_shakespeare_char.py --compile=False --batch_size=128 even if I increase the batch size a bit, the utilization of the GPU computation...

Can I volunteer to add tests folder to root and some unit tests for the train script? Any preference for pytest or unit test or something else? I noticed the...

Not exactly sure why, presumably the compiler is doing some dark magic, but I get *slightly* better performance when using nn.GELU() instead of `new_gelu`.

Like the title said, I've been receiving "loss: nan" after training for about 2000 iterations. The only things I changed in the script was the block size (from 1024 to...

I would like to use my own data to train an AI for playing the Codenames. Is here anybody interested enough who could give me a little guidance, please? **What...

I used 4 GPUs on 1 node: `torchrun --standalone --proc_per_node=4 train.py --compile=False` But, the training speed is just like 1 GPU,why?

I tried to train a baby GPT quickly with the settings provided in the config with PyTorch 2.0 on RTX 2080 Ti python train.py config/train_shakespeare_char.py --compile=True I got Segmentation fault...