Collapse reference+learner hydra heads when using LoRa
🚀 The feature, motivation, and pitch
With additive (delta-style) parameter-efficient tuning methods such as LoRa, we should be able to make a slightly more mem-efficient hydra architecture by using a single block that does ~frozen_head + tunable_weights for the learner/policy head's fwd-pass and simply frozen_head for the reference, instead of maintaining 2x heads.
CC @LouisCastricato and @cat-state for pointing this out
Alternatives
No response
Additional context
No response
Haha, I was not aware that Aman proposed the same thing.
When will this feature be available?
I am not sure anyone started. cc @jon-tow
Not yet. @glerzing is looking into peft migration, which should make this very simple as the package provides a context manager to disable adapters; then, we can use it over reference model calls:
https://github.com/huggingface/peft/blob/34027fe813756897767b9a6f19ae7f1c4c7b418c/src/peft/peft_model.py#L290-L299
Cool!!! Looking forward to this update. 👍
Sorry to make you wait, but it should take a few weeks to get this done. As a very rough estimate, I would say that I may push a tested solution around the 10th may. But I'm new here so I don't know how much time would then happen before a new release version.