Easy-Transformer
Easy-Transformer copied to clipboard
[Question] Generation not possible with hooks?
I'm trying to implement various methods from the patchscopes paper, and some of them utilize token generation to eg. explain the meaning of a patched representation.
I tried this and it kinda works. It seems if you run .generate() on a hooked model, the hook will apply during generation. However, then you come into the next difficulty, which is that after the first forward pass, the shape of the target_activations is [x, 1, x]-- I guess this is because the rest of the activations are cached, and the model only computes the new token on subsequent forward passes?
I was hoping perhaps that the hooks might get cleared out after the first pass of the generation, but that doesn't seem to be the case. This might be a quick fix for making this work.
For future reference, I got multi-token generating working with nnsight and implemented patchscopes here: https://github.com/jcoombes/obvs/pull/22