trlx Collapse reference+learner hydra heads when using LoRa

🚀 The feature, motivation, and pitch

With additive (delta-style) parameter-efficient tuning methods such as LoRa, we should be able to make a slightly more mem-efficient hydra architecture by using a single block that does ~frozen_head + tunable_weights for the learner/policy head's fwd-pass and simply frozen_head for the reference, instead of maintaining 2x heads.

CC @LouisCastricato and @cat-state for pointing this out

Alternatives

No response

Additional context

No response

Feb 21 '23 17:02 jon-tow

Haha, I was not aware that Aman proposed the same thing.

Feb 21 '23 17:02 LouisCastricato

When will this feature be available?

Apr 18 '23 16:04 Opdoop

I am not sure anyone started. cc @jon-tow

Apr 18 '23 19:04 LouisCastricato

Not yet. @glerzing is looking into peft migration, which should make this very simple as the package provides a context manager to disable adapters; then, we can use it over reference model calls:

https://github.com/huggingface/peft/blob/34027fe813756897767b9a6f19ae7f1c4c7b418c/src/peft/peft_model.py#L290-L299

Apr 18 '23 19:04 jon-tow

Cool!!! Looking forward to this update. 👍

Apr 19 '23 06:04 Opdoop

Sorry to make you wait, but it should take a few weeks to get this done. As a very rough estimate, I would say that I may push a tested solution around the 10th may. But I'm new here so I don't know how much time would then happen before a new release version.

Apr 25 '23 21:04 glerzing