tango icon indicating copy to clipboard operation
tango copied to clipboard

Soft prompts

Open dirkgr opened this issue 3 years ago • 5 comments

In code:

import transformers
t = transformers.AutoModel.from_pretrained("gpt2")
twp = make_prefix_transformer(t, prefix_length=3)

In config files:

{
    model: {
        type: "transformers::with_soft_prompt",
        prompt_length: 3,
        model: {
            type: "transformers::AutoModelForCausalLM::from_pretrained",
            pretrained_model_name_or_path: "gpt2"
        },
    }
}

Missing:

  • [x] Tests
  • [x] Docs
  • [x] Try it with T5
  • [ ] A proper end-to-end training config that uses this
  • [x] Add an easy way to make only the prefix trainable, and leave the rest of the weights alone

dirkgr avatar Mar 15 '22 18:03 dirkgr

Oh no, I found a big problem with this. It doesn't work with past_key_values. Fix incoming.

dirkgr avatar Mar 16 '22 00:03 dirkgr

This does not work for T5 at all 😭. I'm no longer sure this approach of patching the model will work. The huggingface generation code makes calls into the middle of their model, instead of always going through the forward() method. So patching forward() doesn't work. And patching forward() of an internal method breaks all sorts of assumptions that other parts of the code have about that forward() method.

dirkgr avatar Mar 16 '22 21:03 dirkgr

This does not work for T5 at all 😭. I'm no longer sure this approach of patching the model will work. The huggingface generation code makes calls into the middle of their model, instead of always going through the forward() method. So patching forward() doesn't work.

Is this problematic for generation only?

And patching forward() of an internal method breaks all sorts of assumptions that other parts of the code have about that forward() method.

This is what I was worrying about above.

ZhaofengWu avatar Mar 16 '22 21:03 ZhaofengWu

Copying from Slack:

I can patch just the encoder for T5. Then the soft prompt has the opportunity to change how the rest of the prompt is encoded. But the encoded soft tokens are not part of the encoder output, and cannot be attended to by the decoder. @ZhaofengWu, is that important?

dirkgr avatar Mar 16 '22 22:03 dirkgr

Just to resolve this chain of comments: I made it work with T5.

dirkgr avatar Mar 17 '22 17:03 dirkgr

This is ready for another review.

dirkgr avatar Nov 30 '22 20:11 dirkgr