nanoGPT icon indicating copy to clipboard operation
nanoGPT copied to clipboard

The simplest, fastest repository for training/finetuning medium-sized GPTs.

Results 297 nanoGPT issues
Sort by recently updated
recently updated
newest added

https://github.com/karpathy/nanoGPT/blob/325be85d9be8c81b436728a420e85796c57dba7e/train.py#L106 In my implementation of the code, I modified this line to incorporate the iteration into the seed. I suspect that if you resume training multiple times, the random seed...

Hi all, Thanks @karpathy for this and the lectures are also awesome! I am training a GPT-2 model, the loss is decreasing and everything looks fine except my MFU values...

I was getting 104% MFU on h100 then i realized that MFU calculation might of been based on a100 312 tflops h100 is 989 tflops at bfloat16. nvidia claims 1,979tflops...

It's my code and everything is in a copyright .

I observe that the loss converges around 100000 steps. Why do we need to further train the model until 600000 steps?

I wonder what to change in the code if I at inference time only want the logits and sample from the probabilities distribution on a subset of the total vocabulary...

In [train.py](https://github.com/karpathy/nanoGPT/blob/master/train.py#L59C1-L59C10) `max_iters` is set to 600000 however the loss gets close 2.8 much earlier like 300000 iter and fluctuates a bit there. I wonder if can do early stop...

This needs fixing since that package in debian 12 isn't compiled with cuda, you should have this be able to train without cuda as well. Up to us to choose...

When running ```python python sample.py --init_from=gpt2 --num_samples=2 --max_new_tokens=100 ``` having set `device = 'mps'` on my M1 Pro MacBook (MacOS 14.4), with Torch 2.2.1 and 2.2.0, I get this output:...