Michaelikarasik

Results 1 issues of Michaelikarasik

I'm attempting to zero-ablate all self-attention outputs of the last token across all layers, such that the model's prediction should only depend on the last token and not be affected...