nanoGPT icon indicating copy to clipboard operation
nanoGPT copied to clipboard

The simplest, fastest repository for training/finetuning medium-sized GPTs.

Results 297 nanoGPT issues
Sort by recently updated
recently updated
newest added

If I understand correctly, you have max 600000 iterations times batches of 12, which is roughly 7M training examples fed to the transformer, way smaller than the 9B tokens of...

Hi, love this project as a way to learn from scratch with local development. I was able to finetune the model, generate the checkpoints, generate the samples. Is there an...

Hello I've an issue while loading my dataset in prepare.py (for obenwebtext). The download and the extraction complete successfully but the generation of train split raise an error. I've already...

This PR is a mostly failed attempt to fix [issue #95](https://github.com/karpathy/minGPT/issues/95) from the minGPT repo. The idea is to save the results of key and value projections in each self-attention...

Try on a cluster using multiple nodes. Example: 1) run "torchrun --nproc_per_node=2 --nnodes=2 --node_rank=0 --master_addr= --master_port=1234 train.py --dataset=shakespeare --dtype=float16 --batch_size=2 --compile=False" Got errors: torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to...

I noticed the comment that you're using torch 2.0 and if you encounter warnings to set `--compile=False` Problem I'm running into is flash is auto-detected # flash attention make GPU...

Implements torch sdpa for mem_efficient kernel support! Using the mem_efficient kernel results in a ~15.5% faster training time per batch, going from a ~154ms/batch baseline to ~130ms/batch. (Ran on 8...

Like the title says, I have a 300 aws credit, I'm curious if any of you have worked ou lt costs to train the XL model. If anyone is interested...