nanoGPT icon indicating copy to clipboard operation
nanoGPT copied to clipboard

The simplest, fastest repository for training/finetuning medium-sized GPTs.

Results 297 nanoGPT issues
Sort by recently updated
recently updated
newest added

Hello, Could someone explain to me how is the Dataset being divided between all the GPUs ? I know that Pytorch have something like DistributedSampler to do that, but I...

Hello, I am training gpt2 from scratch, but I found that the data processing of openwebtext is too slow, and our GPU server can't connect to the Internet. It's taken...

*Accidently messed up with the PR and the branch, so let's try one more time* I really don't like making such somewhat big PRs, but don't want to bombard with...

I been trying to use GPT2-1.5b to do some Q/A but it seems that the model continues to generate (repeating itself over and over again) until max tokens are reached....

It could be interesting to have some strong opinionated guide from the author addressing some typical issues: - The need or not of freezing some layers while fine tuning, and...

Hi there, I have a custom dataset that is quite large ~40GB, similar to the openwebtext data you present as an example. My dataset is not in huggingface, and running...

I am trying to train the gpt2-small model with DDP on a 8x80GB H100 Nvidia GPUs. Irrespective of the pytorch nightly versions, I always ends up with the below error...

`RuntimeError: Error(s) in loading state_dict for GPT2LMHeadModel: size mismatch for transformer.h.0.attn.c_attn.weight: copying a param with shape torch.Size([2304, 768]) from checkpoint, the shape in current model is torch.Size([768, 2304]). size mismatch...