nanoGPT icon indicating copy to clipboard operation
nanoGPT copied to clipboard

The simplest, fastest repository for training/finetuning medium-sized GPTs.

Results 297 nanoGPT issues
Sort by recently updated
recently updated
newest added

to run quick experiments run 1- run the training from [here](https://github.com/HamidShojanazeri/nanoGPT/tree/tp_compile#quick-start) its very quick 2- run the TP inference ```bash sh run_tp.sh ```

Excuse me, just want to ask whether there's any progress on Eval zero-shot perplexities on standard evals (e.g. LAMBADA? HELM? etc.). We're using this repo for downstream eval to show...

Each sample of openwebtext consists of several paragraphs extracted from a single webpage. nanoGPT is trained to predict a token of a sample, given its previous tokens. For train split...

This PR adds improvements regarding the output on the console during training. - [x] Cleaner output - [x] Align columns - [x] remove duplicate information from `log_interval` output and move...

Hi! The batch size of nanoGPT is batch_size*gradient_accumulation_steps = 12*40 = 480. The batch size mentioned in the GPT-2 paper is 512. May I ask why nanoGPT was trained with...

Hi, Right now, at each training step, the `y` is ahead of `x` by one token, I'm just wondering if we can train 2 or even more future tokes in...

Hi, I'm thinking about adding some special END OF TEXT token to my data (to separate different articles), e.g: https://github.com/karpathy/nanoGPT/issues/244 I checked here: https://github.com/karpathy/nanoGPT/blob/eba36e84649f3c6d840a93092cb779a260544d08/data/shakespeare_char/prepare.py#L51 and I'm wondering if the training...

Change use of `total_batches`, currently it is hard-coded , so it throws Index Error on last batch & for validation.

Hi, I noticed the seed this set to 1337 here: https://github.com/karpathy/nanoGPT/blob/eba36e84649f3c6d840a93092cb779a260544d08/sample.py#L19 which is the same as in train.py (for single GPU). Does this seed has to be the same? or...

Hello, I've been attempting to fine-tune a GPT-XL model on a Shakespeare dataset using an MPS device. However, I've noticed that both the training loss and evaluation loss are significantly...