Chris Esposo
Results
2
issues of
Chris Esposo
Summary This PR addresses two things, the extension of model_ext.py and train_sat.py from leyan_branch with my additions from the previous PR. Second it addresses some run-time issues in the flash...
Option 1 - Flag Ablation code - ReLU - Restricted Setting - Softmax q-error + Option 2 - Flag Scale Attention Weights based on the context window * log(query_position) [error...