nanoGPT icon indicating copy to clipboard operation
nanoGPT copied to clipboard

The simplest, fastest repository for training/finetuning medium-sized GPTs.

Results 297 nanoGPT issues
Sort by recently updated
recently updated
newest added

Hi, my system has 16 GPUs per node. However, if I run `torchrun --standalone --nproc_per_node=16 train.py config/train_gpt2.py` The training crashed. How can I use 16 GPUs?

Per the [recent paper from Meta](https://arxiv.org/abs/2404.19737), it appears that models that predict multiple future tokens can exhibit significantly greater sample efficiency than models trained only on next-token prediction, plus the...

### **Problem:** Use of `GradScaler` gives `AssertionError` in `train.py` while using `device = cpu`. `Traceback (most recent call last): File "/home/brainiac77/github/neural-network-playground/gpt/train.py", line 305, in scaler.scale(loss).backward() ^^^^^^^^^^^^^^^^^^ File "/home/brainiac77/miniconda3/envs/vision-1/lib/python3.12/site-packages/torch/cuda/amp/grad_scaler.py", line 203,...

` class CausalSelfAttention(nn.Module): def forward(self, x): B, T, C = x.size() # batch size, sequence length, embedding dimensionality (n_embd) # calculate query, key, values for all heads in batch and...

we have a project www.opendigitaltwin.top to provide a SDK for CAX and collected more than 60 open source software. We want to put all the codes and add more comments...

Is this appropriate for a dataset that is only one 80KB Markdown file? If there are more appropriate ways to train small language model like this please let me know.

Frist thanks for the wonderful project. I learn a lot by reading the code. https://github.com/karpathy/nanoGPT/blob/325be85d9be8c81b436728a420e85796c57dba7e/model.py#L166 I see the biasis are inited to zeros, but it should have no effect (bacuase...