nanoGPT issues

Results 297 nanoGPT issues

Sort by recently updated

16 GPU per node

Hi, my system has 16 GPUs per node. However, if I run `torchrun --standalone --nproc_per_node=16 train.py config/train_gpt2.py` The training crashed. How can I use 16 GPUs?

spcrobocar

Implement multi-token prediction option for models

Per the [recent paper from Meta](https://arxiv.org/abs/2404.19737), it appears that models that predict multiple future tokens can exhibit significantly greater sample efficiency than models trained only on next-token prediction, plus the...

tmostak

Fix: conditional use of GradScaler based on device_type and dtype in train.py

### **Problem:** Use of `GradScaler` gives `AssertionError` in `train.py` while using `device = cpu`. `Traceback (most recent call last): File "/home/brainiac77/github/neural-network-playground/gpt/train.py", line 305, in scaler.scale(loss).backward() ^^^^^^^^^^^^^^^^^^ File "/home/brainiac77/miniconda3/envs/vision-1/lib/python3.12/site-packages/torch/cuda/amp/grad_scaler.py", line 203,...

BRAINIAC2677

nanoGPT/model.py where `manual implementation of attention`,Is it correct to modify it like I did？

` class CausalSelfAttention(nn.Module): def forward(self, x): B, T, C = x.size() # batch size, sequence length, embedding dimensionality (n_embd) # calculate query, key, values for all heads in batch and...

wmx-github

could nanoGPT be the AI assistant for the development of CAX software?

we have a project www.opendigitaltwin.top to provide a SDK for CAX and collected more than 60 open source software. We want to put all the codes and add more comments...

fengsim

Recommendation for something smaller

Is this appropriate for a dataset that is only one 80KB Markdown file? If there are more appropriate ways to train small language model like this please let me know.

diamondfishtools

[Question] why bias is init to zero?

Frist thanks for the wonderful project. I learn a lot by reading the code. https://github.com/karpathy/nanoGPT/blob/325be85d9be8c81b436728a420e85796c57dba7e/model.py#L166 I see the biasis are inited to zeros, but it should have no effect (bacuase...

michael8090

nanoGPT
nanoGPT copied to clipboard

Metadata

16 GPU per node

Implement multi-token prediction option for models

Fix: conditional use of GradScaler based on device_type and dtype in train.py

nanoGPT/model.py where `manual implementation of attention`,Is it correct to modify it like I did？

could nanoGPT be the AI assistant for the development of CAX software?

Recommendation for something smaller

[Question] why bias is init to zero?

← Metadata

Owner

Metadata

nanoGPT nanoGPT copied to clipboard

Metadata

16 GPU per node

Implement multi-token prediction option for models

Fix: conditional use of GradScaler based on device_type and dtype in train.py

nanoGPT/model.py where `manual implementation of attention`,Is it correct to modify it like I did？

could nanoGPT be the AI assistant for the development of CAX software?

Recommendation for something smaller

[Question] why bias is init to zero?

← Metadata

Owner

Metadata

nanoGPT
nanoGPT copied to clipboard