Chris Esposo

Results 2 issues of Chris Esposo

Summary This PR addresses two things, the extension of model_ext.py and train_sat.py from leyan_branch with my additions from the previous PR. Second it addresses some run-time issues in the flash...

Option 1 - Flag Ablation code - ReLU - Restricted Setting - Softmax q-error + Option 2 - Flag Scale Attention Weights based on the context window * log(query_position) [error...