nanoGPT issues

Results 297 nanoGPT issues

Sort by recently updated

bugfix for estimating_mfu when using ddp

DDP wraps the real module behind model.module, so we need to account for that when calling custom model methods.

Triton Error [CUDA]: invalid argument

vocab_size not found in data/openwebtext/meta.pkl, using GPT-2 default of 50257 Initializing a new model from scratch number of parameters: 124.34M compiling the model... (takes a ~minute) To use data.metrics please...

zscwind

The GPU utilization very low?

![image](https://user-images.githubusercontent.com/5590961/216866285-72afbfa0-c4a0-4f91-89b4-bafec3361f96.png) I am using ubuntu on window with rtx 3080. python3 train.py config/train_shakespeare_char.py --compile=False --batch_size=128 even if I increase the batch size a bit, the utilization of the GPU computation...

wx1988

Add unit testing and simple unit tests

Can I volunteer to add tests folder to root and some unit tests for the train script? Any preference for pytest or unit test or something else? I noticed the...

mlopsnews

Slightly higher performance when replacing `new_gelu` with `nn.GELU()`

Not exactly sure why, presumably the compiler is doing some dark magic, but I get *slightly* better performance when using nn.GELU() instead of `new_gelu`.

vgoklani

fix typo

abrahamsangha

Loss becomes nan after training for ~2000 iterations

Like the title said, I've been receiving "loss: nan" after training for about 2000 iterations. The only things I changed in the script was the block size (from 1024 to...

OlympixJack

Codenames - help with training AI for the tabletop game

I would like to use my own data to train an AI for playing the Codenames. Is here anybody interested enough who could give me a little guidance, please? **What...

damucz

Multi GPUs training is very slow

I used 4 GPUs on 1 node: `torchrun --standalone --proc_per_node=4 train.py --compile=False` But, the training speed is just like 1 GPU，why?

zscwind

Segmentation fault when training shakespeare char on RTX 2080 Ti with compiled model.

I tried to train a baby GPT quickly with the settings provided in the config with PyTorch 2.0 on RTX 2080 Ti python train.py config/train_shakespeare_char.py --compile=True I got Segmentation fault...

jamesliu

nanoGPT
nanoGPT copied to clipboard

Metadata

bugfix for estimating_mfu when using ddp

Triton Error [CUDA]: invalid argument

The GPU utilization very low?

Add unit testing and simple unit tests

Slightly higher performance when replacing `new_gelu` with `nn.GELU()`

fix typo

Loss becomes nan after training for ~2000 iterations

Codenames - help with training AI for the tabletop game

Multi GPUs training is very slow

Segmentation fault when training shakespeare char on RTX 2080 Ti with compiled model.

← Metadata

Owner

Metadata

nanoGPT nanoGPT copied to clipboard

Metadata

← Metadata

Owner

Metadata

nanoGPT
nanoGPT copied to clipboard