offsite-tuning The authors should consider changing the term “adapter” to avoid potential confusion with adapter-tuning.

The authors should consider changing the term “adapter” to avoid potential confusion with adapter-tuning.

Open henryzhongsc opened this issue 1 year ago • 0 comments

Great work with an elegant but effective idea! Thanks for sharing. However, I have a minor suggestion.

It is well-known that in the LLM finetuning paradigm, adapter-tuning [1] — done by inserting lightweight modules between transformer layers and only updating such modules upon downstream tasks — is a popular approach. In this work, the “adapters” the authors refer to are not such modules, but rather a selection of layers from the pertained model. The authors clearly know this term overlap, as there are even combo experiments on offsite-tuning + adapter-tuning (Table 5).

Given both approaches are within the realm of parameter-efficient finetuning. I’d encourage the authors to find an alternative term for your “adapter” to avoid potential confusion and ambiguities.

A couple of preliminary examples I can come up with are “bridging/pluggable/relay/alignment/shared + layers/units/components.” Hope it helps!

[1] Houlsby et al., Parameter-efficient transfer learning for NLP. ICML 2019.

Feb 24 '23 00:02 henryzhongsc

offsite-tuning offsite-tuning copied to clipboard

The authors should consider changing the term “adapter” to avoid potential confusion with adapter-tuning.

offsite-tuning
offsite-tuning copied to clipboard