nanoGPT issues

Results 297 nanoGPT issues

Sort by recently updated

More positional encoding options: NoPE and ALiBi

This PR adds a flag `--pos_encoding`with the following positional encoding methods: - `nope`: NoPE (No Positional Encoding, see [The Impact of Positional Encoding on Length Generalization in Transformers](https://arxiv.org/abs/2305.19466)) - `alibi`:...

matthiasgeihs

Update README.md

just try，Please ignore

ysx1223

Can you guys please help to look at my implementation of intersecting attention?

``` class IMQA(nn.Module): def __init__(self, config): super().__init__() self.num_heads = config.n_head self.key_dim = config.n_embd // config.n_head self.q_proj = nn.Linear(config.n_embd, config.n_embd) self.k_proj = nn.Linear(config.n_embd, config.n_embd) self.v_proj = nn.Linear(config.n_embd, config.n_embd) self.scale = (self.key_dim...

win10ogod

Mac M1 mps error mps.scatter_nd when iterating across architectures

Hi all, I'm on Apple M1 and used the nanoGPT train.py to make a train_gpt(...) function, so I can sweep across architectures (varying number of layers, heads and embeddings). When...

mercicle

Added snapshots to train.py

Modified train.py script to make snapshots of checkpoints. Added 3 new config values: - take_snapshots (default = False) - if True, saves snapshots of checkpoints at specified conditions - snapshot_dir...

MicroPanda123

Reproducing GPT-2 results on M1 Mac

I'm trying to reproduce gpt-2 results on my local machine, but I'm running into different errors with CUDA being enabled even when I set the device to mps. I'm wondering...

manavramprasad

Loss calculation

What is the purpose of ignore_index=-1 in loss calculation? I understand it's usually applied to exclude special tokens like padding, sequence end, etc. But nanoGPT does not seem to use...

mailgpa

Running the code

When I try to run the code it doesn't work for some random reason... ![image](https://github.com/karpathy/nanoGPT/assets/136958814/02189d2d-7d4b-4a3d-a7bb-57365f5b3313) It does not talk like a normal chatbot would.

ionbotYT

nit: remove explicit ignore_index=-1

I believe setting `ignore_index=-1` when calling `cross_entropy` on line `187` is spurious, as: 1. ignore_index=N doesn't ignore the Nth index in the target list, it ignores any values in the...

transmissions11

Why in the cross_entropy, set ignore_index be -1?

Why in the cross_entropy, set ignore_index be -1? According to the [doc] (https://pytorch.org/docs/stable/generated/torch.nn.functional.cross_entropy.html), ignore_index is -1 means the last token in the predicted sequence "is ignored and does not contribute...

randbear

nanoGPT
nanoGPT copied to clipboard

Metadata

More positional encoding options: NoPE and ALiBi

Update README.md

Can you guys please help to look at my implementation of intersecting attention?

Mac M1 mps error mps.scatter_nd when iterating across architectures

Added snapshots to train.py

Reproducing GPT-2 results on M1 Mac

Loss calculation

Running the code

nit: remove explicit ignore_index=-1

Why in the cross_entropy, set ignore_index be -1?

← Metadata

Owner

Metadata

nanoGPT nanoGPT copied to clipboard

Metadata

← Metadata

Owner

Metadata

nanoGPT
nanoGPT copied to clipboard