nanoGPT issues

Results 297 nanoGPT issues

Sort by recently updated

adding TP inference

to run quick experiments run 1- run the training from [here](https://github.com/HamidShojanazeri/nanoGPT/tree/tp_compile#quick-start) its very quick 2- run the TP inference ```bash sh run_tp.sh ```

HamidShojanazeri

Finetuning on Downstream Tasks: Eval zero-shot perplexities on standard evals (e.g. LAMBADA? HELM? etc.)

Excuse me, just want to ask whether there's any progress on Eval zero-shot perplexities on standard evals (e.g. LAMBADA? HELM? etc.). We're using this repo for downstream eval to show...

BiEchi

Bug - model trained on Xs from two sample texts

Each sample of openwebtext consists of several paragraphs extracted from a single webpage. nanoGPT is trained to predict a token of a sample, given its previous tokens. For train split...

Majdoddin

Cleaner & more verbose & colored output on console

This PR adds improvements regarding the output on the console during training. - [x] Cleaner output - [x] Align columns - [x] remove duplicate information from `log_interval` output and move...

klezm

why batch size = 480 instead of 512 as in the GPT-2 paper?

Hi! The batch size of nanoGPT is batch_size*gradient_accumulation_steps = 12*40 = 480. The batch size mentioned in the GPT-2 paper is 512. May I ask why nanoGPT was trained with...

shehper

Newbie Q: is it possible to train n (look ahead) tokens at a time?

Hi, Right now, at each training step, the `y` is ahead of `x` by one token, I'm just wondering if we can train 2 or even more future tokes in...

mw66

Newbie Q: does the training data (train_ids) has to be consecutive? can I inject -1 as the integer marker id into train_ids?

Hi, I'm thinking about adding some special END OF TEXT token to my data (to separate different articles), e.g: https://github.com/karpathy/nanoGPT/issues/244 I checked here: https://github.com/karpathy/nanoGPT/blob/eba36e84649f3c6d840a93092cb779a260544d08/data/shakespeare_char/prepare.py#L51 and I'm wondering if the training...

mw66

nanoGPT
nanoGPT copied to clipboard

Metadata

adding TP inference

Finetuning on Downstream Tasks: Eval zero-shot perplexities on standard evals (e.g. LAMBADA? HELM? etc.)

Bug - model trained on Xs from two sample texts

Cleaner & more verbose & colored output on console

why batch size = 480 instead of 512 as in the GPT-2 paper?

Newbie Q: is it possible to train n (look ahead) tokens at a time?

Newbie Q: does the training data (train_ids) has to be consecutive? can I inject -1 as the integer marker id into train_ids?

Fix IndexError on val.bin generate

does the seed has to be the same in sample.py as in train.py?

loss increase when finetuning on shakespeare datasets with gpt-xl and mps device

← Metadata

Owner

Metadata

nanoGPT nanoGPT copied to clipboard

Metadata

← Metadata

Owner

Metadata

nanoGPT
nanoGPT copied to clipboard