nanoGPT
nanoGPT copied to clipboard
Dec branch commit
Option 1 - Flag Ablation code - ReLU - Restricted Setting - Softmax q-error +
Option 2 - Flag Scale Attention Weights based on the context window * log(query_position) [error that grows with context-length can be bound by the context-length]
Option 3 - Flag Instead of using positional encoding, we replace the first field with recurrent neural network or LSTM