nanoGPT issues

Resume Training

3

https://github.com/karpathy/nanoGPT/blob/325be85d9be8c81b436728a420e85796c57dba7e/train.py#L106 In my implementation of the code, I modified this line to incorporate the iteration into the seed. I suspect that if you resume training multiple times, the random seed...

tiredsoul21

MFU too low in custom GPT-2 training

1

Hi all, Thanks @karpathy for this and the lectures are also awesome! I am training a GPT-2 model, the loss is decreasing and everything looks fine except my MFU values...

eonurk

fix: h100-mfu-calculation

I was getting 104% MFU on h100 then i realized that MFU calculation might of been based on a100 312 tflops h100 is 989 tflops at bfloat16. nvidia claims 1,979tflops...

OrenLeung

nano_gpt

It's my code and everything is in a copyright .

Mihir0567

Fixing eval path in README

goswamig

Why do we need further pretrain given the loss is already converged

1

I observe that the loss converges around 100000 steps. Why do we need to further train the model until 600000 steps?

BiEchi

Sample from a subset of the token_embedding_table

3

I wonder what to change in the code if I at inference time only want the logits and sample from the probabilities distribution on a subset of the total vocabulary...

PLarsen79

Training loss converges much earlier compared to max_iters

1

In [train.py](https://github.com/karpathy/nanoGPT/blob/master/train.py#L59C1-L59C10) `max_iters` is set to 600000 however the loss gets close 2.8 much earlier like 300000 iter and fluctuates a bit there. I wonder if can do early stop...

goswamig

no cuda training does not work.

1

This needs fixing since that package in debian 12 isn't compiled with cuda, you should have this be able to train without cuda as well. Up to us to choose...

BurkenDev

Torch >= 2.2.0 inference issues on MPS

2

When running ```python python sample.py --init_from=gpt2 --num_samples=2 --max_new_tokens=100 ``` having set `device = 'mps'` on my M1 Pro MacBook (MacOS 14.4), with Torch 2.2.1 and 2.2.0, I get this output:...

davmacario

nanoGPT
nanoGPT copied to clipboard

Metadata

Resume Training

MFU too low in custom GPT-2 training

fix: h100-mfu-calculation

nano_gpt

Fixing eval path in README

Why do we need further pretrain given the loss is already converged

Sample from a subset of the token_embedding_table

Training loss converges much earlier compared to max_iters

no cuda training does not work.

Torch >= 2.2.0 inference issues on MPS

← Metadata

Owner

Metadata

nanoGPT nanoGPT copied to clipboard

Metadata

← Metadata

Owner

Metadata

nanoGPT
nanoGPT copied to clipboard