nanoGPT icon indicating copy to clipboard operation
nanoGPT copied to clipboard

The simplest, fastest repository for training/finetuning medium-sized GPTs.

Results 297 nanoGPT issues
Sort by recently updated
recently updated
newest added

This PR adds a flag `--pos_encoding`with the following positional encoding methods: - `nope`: NoPE (No Positional Encoding, see [The Impact of Positional Encoding on Length Generalization in Transformers](https://arxiv.org/abs/2305.19466)) - `alibi`:...

just try,Please ignore

``` class IMQA(nn.Module): def __init__(self, config): super().__init__() self.num_heads = config.n_head self.key_dim = config.n_embd // config.n_head self.q_proj = nn.Linear(config.n_embd, config.n_embd) self.k_proj = nn.Linear(config.n_embd, config.n_embd) self.v_proj = nn.Linear(config.n_embd, config.n_embd) self.scale = (self.key_dim...

Hi all, I'm on Apple M1 and used the nanoGPT train.py to make a train_gpt(...) function, so I can sweep across architectures (varying number of layers, heads and embeddings). When...

Modified train.py script to make snapshots of checkpoints. Added 3 new config values: - take_snapshots (default = False) - if True, saves snapshots of checkpoints at specified conditions - snapshot_dir...

I'm trying to reproduce gpt-2 results on my local machine, but I'm running into different errors with CUDA being enabled even when I set the device to mps. I'm wondering...

What is the purpose of ignore_index=-1 in loss calculation? I understand it's usually applied to exclude special tokens like padding, sequence end, etc. But nanoGPT does not seem to use...

When I try to run the code it doesn't work for some random reason... ![image](https://github.com/karpathy/nanoGPT/assets/136958814/02189d2d-7d4b-4a3d-a7bb-57365f5b3313) It does not talk like a normal chatbot would.

I believe setting `ignore_index=-1` when calling `cross_entropy` on line `187` is spurious, as: 1. ignore_index=N doesn't ignore the Nth index in the target list, it ignores any values in the...

Why in the cross_entropy, set ignore_index be -1? According to the [doc] (https://pytorch.org/docs/stable/generated/torch.nn.functional.cross_entropy.html), ignore_index is -1 means the last token in the predicted sequence "is ignored and does not contribute...