nanoGPT icon indicating copy to clipboard operation
nanoGPT copied to clipboard

The simplest, fastest repository for training/finetuning medium-sized GPTs.

Results 297 nanoGPT issues
Sort by recently updated
recently updated
newest added

Is this repository kinda ready to finetune code translation tasks? E.g I’d like to explore some ideas to convert figma files to framework specific code. new to LLM, any advise...

Either a standard tools (`pip freeze`, `virtualenv`) or something more involved, like conda or poetry.

In my opinion it deserves it

I'm testing out train.py on google colab but no checkpoints are created, even after iter 1000 . I'm using this command: `!cd /content/nanoGPT/ && python train.py --dataset=shakespeare --compile=False --n_layer=4 --n_head=4...

question curiosity, does someone know what the main trick to having nano GPT2 train so quickly? (from the 1 week I was used to to 1day it seems) https://github.com/karpathy/nanoGPT After...

https://github.com/karpathy/nanoGPT/blob/7f74652843d8cbea31e2a9c986caf4a0ad452a6c/model.py#L136 I'd like to ask the reason why nanoGPT don't try other kind of positional embeddings? What is the advantage of using a learnable position embedding? Thanks.

The readability of huge numbers can be improved by adding `:,`, which will insert a comma between every 3 numbers. ``` print(f"train has {len(train_ids)} tokens") print(f"train has {len(train_ids):,} tokens") ```...

Thanks so much for this. if you can work out a simple image generation as well, that will be wonderful. Mingpt have this function, prefer more mature one like put...

Thanks @karpathy for this nice little project. I also really enjoyed watching the Youtube lecture that goes with it! In my proposal I'm addressing the comment here https://github.com/karpathy/nanoGPT/blob/master/configurator.py#L12 A slight...

### Run train.py using "torchrun --standalone --nproc_per_node=1 train.py --dataset=shakespeare --dtype=float32 --batch_size=8 --compile=True" and got the following error: master_addr is only used for static rdzv_backend and when rdzv_endpoint is not specified....