adapters
adapters copied to clipboard
AdapterConfig's leave_out not work well in EncoderDecoderModel
Environment info
-
adapter-transformers
version: 3.1.0 - Platform: Ubuntu 18.04 (Linux-5.4.0-87-generic-x86_64-with-glibc2.27)
- Python version: Python 3.9.13
- PyTorch version (GPU?): 1.13.1 (GPU)
- Tensorflow version (GPU?): False
- Using GPU in script?: True
- Using distributed or parallel set-up in script?: Yes
Information
Model I am using (Bert, XLNet ...): EncoderDecoderModel
Language I am using the model on (English, Chinese ...): English
Adapter setup I am using (if any): AdapterConfig
The problem arises when using:
- [ ] the official example scripts: (give details below)
- [x] my own modified scripts: (give details below)
The tasks I am working on is:
- [ ] an official GLUE/SQUaD task: (give the name)
- [x] my own task or dataset: (give details below)
To reproduce
from transformers import EncoderDecoderModel, AdapterConfig
model = EncoderDecoderModel.from_encoder_decoder_pretrained("bert-base-uncased", "bert-base-uncased")
When not leaving out layers, it's okay.
### no leave_out
adapter_config = AdapterConfig(mh_adapter=True, output_adapter=True, reduction_factor=4, non_linearity="gelu")
model.add_adapter("en", adapter_config)
model.add_adapter("de", adapter_config)
print(model.adapter_summary())
#### print result
================================================================================
Name Architecture #Param %Param Active Train
--------------------------------------------------------------------------------
en bottleneck 7,100,928 2.871 0 1
de bottleneck 7,100,928 2.871 0 1
--------------------------------------------------------------------------------
Full model 247,363,386 100.000 1
================================================================================
When trying to leave out all encoder layers, not any adapter is added.
### leave_out first 12 layers of encoder
adapter_config = AdapterConfig(mh_adapter=True, output_adapter=True, reduction_factor=4, non_linearity="gelu", leave_out=list(range(12)))
model.add_adapter("en", adapter_config, overwrite_ok=True)
model.add_adapter("de", adapter_config, overwrite_ok=True)
print(model.adapter_summary())
##### print result
================================================================================
Name Architecture #Param %Param Active Train
--------------------------------------------------------------------------------
en bottleneck 0 0.000 0 1
de bottleneck 0 0.000 0 1
--------------------------------------------------------------------------------
Full model 247,363,386 100.000 1
================================================================================
When only leaving out the first 6 layers of the encoder, we see adapters are only added to the encoder, leaving the decoder.
### leave_out first 6 layers of encoder
adapter_config = AdapterConfig(mh_adapter=True, output_adapter=True, reduction_factor=4, non_linearity="gelu", leave_out=list(range(6)))
model.add_adapter("en", adapter_config, overwrite_ok=True)
model.add_adapter("de", adapter_config, overwrite_ok=True)
print(model.adapter_summary())
##### print result
================================================================================
Name Architecture #Param %Param Active Train
--------------------------------------------------------------------------------
en bottleneck 3,550,464 1.435 0 1
de bottleneck 3,550,464 1.435 0 1
--------------------------------------------------------------------------------
Full model 247,363,386 100.000 1
================================================================================
#### check parameter
##### print result
print([name for name, p in model.named_parameters() if "adapter" in name])
['encoder.encoder.layer.6.attention.output.adapters.en.adapter_down.0.weight',
'encoder.encoder.layer.6.attention.output.adapters.en.adapter_down.0.bias',
'encoder.encoder.layer.6.attention.output.adapters.en.adapter_up.weight',
'encoder.encoder.layer.6.attention.output.adapters.en.adapter_up.bias',
'encoder.encoder.layer.6.attention.output.adapters.de.adapter_down.0.weight',
'encoder.encoder.layer.6.attention.output.adapters.de.adapter_down.0.bias',
'encoder.encoder.layer.6.attention.output.adapters.de.adapter_up.weight',
'encoder.encoder.layer.6.attention.output.adapters.de.adapter_up.bias',
'encoder.encoder.layer.6.output.adapters.en.adapter_down.0.weight',
'encoder.encoder.layer.6.output.adapters.en.adapter_down.0.bias',
'encoder.encoder.layer.6.output.adapters.en.adapter_up.weight',
'encoder.encoder.layer.6.output.adapters.en.adapter_up.bias',
'encoder.encoder.layer.6.output.adapters.de.adapter_down.0.weight',
'encoder.encoder.layer.6.output.adapters.de.adapter_down.0.bias',
'encoder.encoder.layer.6.output.adapters.de.adapter_up.weight',
'encoder.encoder.layer.6.output.adapters.de.adapter_up.bias',
'encoder.encoder.layer.7.attention.output.adapters.en.adapter_down.0.weight',
'encoder.encoder.layer.7.attention.output.adapters.en.adapter_down.0.bias',
'encoder.encoder.layer.7.attention.output.adapters.en.adapter_up.weight',
'encoder.encoder.layer.7.attention.output.adapters.en.adapter_up.bias',
'encoder.encoder.layer.7.attention.output.adapters.de.adapter_down.0.weight',
'encoder.encoder.layer.7.attention.output.adapters.de.adapter_down.0.bias',
'encoder.encoder.layer.7.attention.output.adapters.de.adapter_up.weight',
'encoder.encoder.layer.7.attention.output.adapters.de.adapter_up.bias',
'encoder.encoder.layer.7.output.adapters.en.adapter_down.0.weight',
'encoder.encoder.layer.7.output.adapters.en.adapter_down.0.bias',
'encoder.encoder.layer.7.output.adapters.en.adapter_up.weight',
'encoder.encoder.layer.7.output.adapters.en.adapter_up.bias',
'encoder.encoder.layer.7.output.adapters.de.adapter_down.0.weight',
'encoder.encoder.layer.7.output.adapters.de.adapter_down.0.bias',
'encoder.encoder.layer.7.output.adapters.de.adapter_up.weight',
'encoder.encoder.layer.7.output.adapters.de.adapter_up.bias',
'encoder.encoder.layer.8.attention.output.adapters.en.adapter_down.0.weight',
'encoder.encoder.layer.8.attention.output.adapters.en.adapter_down.0.bias',
'encoder.encoder.layer.8.attention.output.adapters.en.adapter_up.weight',
'encoder.encoder.layer.8.attention.output.adapters.en.adapter_up.bias',
'encoder.encoder.layer.8.attention.output.adapters.de.adapter_down.0.weight',
'encoder.encoder.layer.8.attention.output.adapters.de.adapter_down.0.bias',
'encoder.encoder.layer.8.attention.output.adapters.de.adapter_up.weight',
'encoder.encoder.layer.8.attention.output.adapters.de.adapter_up.bias',
'encoder.encoder.layer.8.output.adapters.en.adapter_down.0.weight',
'encoder.encoder.layer.8.output.adapters.en.adapter_down.0.bias',
'encoder.encoder.layer.8.output.adapters.en.adapter_up.weight',
'encoder.encoder.layer.8.output.adapters.en.adapter_up.bias',
'encoder.encoder.layer.8.output.adapters.de.adapter_down.0.weight',
'encoder.encoder.layer.8.output.adapters.de.adapter_down.0.bias',
'encoder.encoder.layer.8.output.adapters.de.adapter_up.weight',
'encoder.encoder.layer.8.output.adapters.de.adapter_up.bias',
'encoder.encoder.layer.9.attention.output.adapters.en.adapter_down.0.weight',
'encoder.encoder.layer.9.attention.output.adapters.en.adapter_down.0.bias',
'encoder.encoder.layer.9.attention.output.adapters.en.adapter_up.weight',
'encoder.encoder.layer.9.attention.output.adapters.en.adapter_up.bias',
'encoder.encoder.layer.9.attention.output.adapters.de.adapter_down.0.weight',
'encoder.encoder.layer.9.attention.output.adapters.de.adapter_down.0.bias',
'encoder.encoder.layer.9.attention.output.adapters.de.adapter_up.weight',
'encoder.encoder.layer.9.attention.output.adapters.de.adapter_up.bias',
'encoder.encoder.layer.9.output.adapters.en.adapter_down.0.weight',
'encoder.encoder.layer.9.output.adapters.en.adapter_down.0.bias',
'encoder.encoder.layer.9.output.adapters.en.adapter_up.weight',
'encoder.encoder.layer.9.output.adapters.en.adapter_up.bias',
'encoder.encoder.layer.9.output.adapters.de.adapter_down.0.weight',
'encoder.encoder.layer.9.output.adapters.de.adapter_down.0.bias',
'encoder.encoder.layer.9.output.adapters.de.adapter_up.weight',
'encoder.encoder.layer.9.output.adapters.de.adapter_up.bias',
'encoder.encoder.layer.10.attention.output.adapters.en.adapter_down.0.weight',
'encoder.encoder.layer.10.attention.output.adapters.en.adapter_down.0.bias',
'encoder.encoder.layer.10.attention.output.adapters.en.adapter_up.weight',
'encoder.encoder.layer.10.attention.output.adapters.en.adapter_up.bias',
'encoder.encoder.layer.10.attention.output.adapters.de.adapter_down.0.weight',
'encoder.encoder.layer.10.attention.output.adapters.de.adapter_down.0.bias',
'encoder.encoder.layer.10.attention.output.adapters.de.adapter_up.weight',
'encoder.encoder.layer.10.attention.output.adapters.de.adapter_up.bias',
'encoder.encoder.layer.10.output.adapters.en.adapter_down.0.weight',
'encoder.encoder.layer.10.output.adapters.en.adapter_down.0.bias',
'encoder.encoder.layer.10.output.adapters.en.adapter_up.weight',
'encoder.encoder.layer.10.output.adapters.en.adapter_up.bias',
'encoder.encoder.layer.10.output.adapters.de.adapter_down.0.weight',
'encoder.encoder.layer.10.output.adapters.de.adapter_down.0.bias',
'encoder.encoder.layer.10.output.adapters.de.adapter_up.weight',
'encoder.encoder.layer.10.output.adapters.de.adapter_up.bias',
'encoder.encoder.layer.11.attention.output.adapters.en.adapter_down.0.weight',
'encoder.encoder.layer.11.attention.output.adapters.en.adapter_down.0.bias',
'encoder.encoder.layer.11.attention.output.adapters.en.adapter_up.weight',
'encoder.encoder.layer.11.attention.output.adapters.en.adapter_up.bias',
'encoder.encoder.layer.11.attention.output.adapters.de.adapter_down.0.weight',
'encoder.encoder.layer.11.attention.output.adapters.de.adapter_down.0.bias',
'encoder.encoder.layer.11.attention.output.adapters.de.adapter_up.weight',
'encoder.encoder.layer.11.attention.output.adapters.de.adapter_up.bias',
'encoder.encoder.layer.11.output.adapters.en.adapter_down.0.weight',
'encoder.encoder.layer.11.output.adapters.en.adapter_down.0.bias',
'encoder.encoder.layer.11.output.adapters.en.adapter_up.weight',
'encoder.encoder.layer.11.output.adapters.en.adapter_up.bias',
'encoder.encoder.layer.11.output.adapters.de.adapter_down.0.weight',
'encoder.encoder.layer.11.output.adapters.de.adapter_down.0.bias',
'encoder.encoder.layer.11.output.adapters.de.adapter_up.weight',
'encoder.encoder.layer.11.output.adapters.de.adapter_up.bias']
Expected behavior
The EncoderDecoderModel class work like BART-like models.
Also, it seems EncoderDecoderModelAdaptersMixin.iter_layers
should count decoder layer_id starting with len(self.encoder.layers)) like this?
def iter_layers(self) -> Iterable[Tuple[int, nn.Module]]:
for i, layer in self.encoder.iter_layers():
yield i, layer
encoder_layer_n = len(self.encoder.encoder.layer)
for i, layer in self.decoder.iter_layers():
yield i + encoder_layer_n, layer
Hey @ZeguanXiao, I see why this is unexpected behavior. Unfortunately it is not as easy as changing the iter_layer
indices. I will look into this.
@hSterz My current workaround is setting model.decoder.base_model.config.adapters = model.encoder.base_model.config.adapters
and changing the iter_layer
. It seems to work fine.
This issue has been automatically marked as stale because it has been without activity for 90 days. This issue will be closed in 14 days unless you comment or remove the stale label.