Soft prompts
In code:
import transformers
t = transformers.AutoModel.from_pretrained("gpt2")
twp = make_prefix_transformer(t, prefix_length=3)
In config files:
{
model: {
type: "transformers::with_soft_prompt",
prompt_length: 3,
model: {
type: "transformers::AutoModelForCausalLM::from_pretrained",
pretrained_model_name_or_path: "gpt2"
},
}
}
Missing:
- [x] Tests
- [x] Docs
- [x] Try it with T5
- [ ] A proper end-to-end training config that uses this
- [x] Add an easy way to make only the prefix trainable, and leave the rest of the weights alone
Oh no, I found a big problem with this. It doesn't work with past_key_values. Fix incoming.
This does not work for T5 at all 😭. I'm no longer sure this approach of patching the model will work. The huggingface generation code makes calls into the middle of their model, instead of always going through the forward() method. So patching forward() doesn't work. And patching forward() of an internal method breaks all sorts of assumptions that other parts of the code have about that forward() method.
This does not work for T5 at all 😭. I'm no longer sure this approach of patching the model will work. The huggingface generation code makes calls into the middle of their model, instead of always going through the
forward()method. So patchingforward()doesn't work.
Is this problematic for generation only?
And patching
forward()of an internal method breaks all sorts of assumptions that other parts of the code have about thatforward()method.
This is what I was worrying about above.
Copying from Slack:
I can patch just the encoder for T5. Then the soft prompt has the opportunity to change how the rest of the prompt is encoded. But the encoded soft tokens are not part of the encoder output, and cannot be attended to by the decoder. @ZhaofengWu, is that important?
Just to resolve this chain of comments: I made it work with T5.
This is ready for another review.