tango Soft prompts

In code:

import transformers
t = transformers.AutoModel.from_pretrained("gpt2")
twp = make_prefix_transformer(t, prefix_length=3)

In config files:

{
    model: {
        type: "transformers::with_soft_prompt",
        prompt_length: 3,
        model: {
            type: "transformers::AutoModelForCausalLM::from_pretrained",
            pretrained_model_name_or_path: "gpt2"
        },
    }
}

Missing:

[x] Tests
[x] Docs
[x] Try it with T5
[ ] A proper end-to-end training config that uses this
[x] Add an easy way to make only the prefix trainable, and leave the rest of the weights alone

Mar 15 '22 18:03 dirkgr

Oh no, I found a big problem with this. It doesn't work with past_key_values. Fix incoming.

Mar 16 '22 00:03 dirkgr

This does not work for T5 at all 😭. I'm no longer sure this approach of patching the model will work. The huggingface generation code makes calls into the middle of their model, instead of always going through the forward() method. So patching forward() doesn't work. And patching forward() of an internal method breaks all sorts of assumptions that other parts of the code have about that forward() method.

Mar 16 '22 21:03 dirkgr

This does not work for T5 at all 😭. I'm no longer sure this approach of patching the model will work. The huggingface generation code makes calls into the middle of their model, instead of always going through the forward() method. So patching forward() doesn't work.

Is this problematic for generation only?

And patching forward() of an internal method breaks all sorts of assumptions that other parts of the code have about that forward() method.

This is what I was worrying about above.

Mar 16 '22 21:03 ZhaofengWu

Copying from Slack:

I can patch just the encoder for T5. Then the soft prompt has the opportunity to change how the rest of the prompt is encoded. But the encoded soft tokens are not part of the encoder output, and cannot be attended to by the decoder. @ZhaofengWu, is that important?

Mar 16 '22 22:03 dirkgr

Just to resolve this chain of comments: I made it work with T5.

Mar 17 '22 17:03 dirkgr

This is ready for another review.

Nov 30 '22 20:11 dirkgr