adapters icon indicating copy to clipboard operation
adapters copied to clipboard

AdapterConfig's leave_out not work well in EncoderDecoderModel

Open ZeguanXiao opened this issue 2 years ago • 4 comments

Environment info

  • adapter-transformers version: 3.1.0
  • Platform: Ubuntu 18.04 (Linux-5.4.0-87-generic-x86_64-with-glibc2.27)
  • Python version: Python 3.9.13
  • PyTorch version (GPU?): 1.13.1 (GPU)
  • Tensorflow version (GPU?): False
  • Using GPU in script?: True
  • Using distributed or parallel set-up in script?: Yes

Information

Model I am using (Bert, XLNet ...): EncoderDecoderModel

Language I am using the model on (English, Chinese ...): English

Adapter setup I am using (if any): AdapterConfig

The problem arises when using:

  • [ ] the official example scripts: (give details below)
  • [x] my own modified scripts: (give details below)

The tasks I am working on is:

  • [ ] an official GLUE/SQUaD task: (give the name)
  • [x] my own task or dataset: (give details below)

To reproduce

from transformers import EncoderDecoderModel, AdapterConfig
model = EncoderDecoderModel.from_encoder_decoder_pretrained("bert-base-uncased", "bert-base-uncased")

When not leaving out layers, it's okay.

### no leave_out
adapter_config = AdapterConfig(mh_adapter=True, output_adapter=True, reduction_factor=4, non_linearity="gelu")
model.add_adapter("en", adapter_config)
model.add_adapter("de", adapter_config)
print(model.adapter_summary())

#### print result
================================================================================
Name                     Architecture         #Param      %Param  Active   Train
--------------------------------------------------------------------------------
en                       bottleneck        7,100,928       2.871       0       1
de                       bottleneck        7,100,928       2.871       0       1
--------------------------------------------------------------------------------
Full model                               247,363,386     100.000               1
================================================================================

When trying to leave out all encoder layers, not any adapter is added.

### leave_out first 12 layers of encoder
adapter_config = AdapterConfig(mh_adapter=True, output_adapter=True, reduction_factor=4, non_linearity="gelu", leave_out=list(range(12)))
model.add_adapter("en", adapter_config, overwrite_ok=True)
model.add_adapter("de", adapter_config, overwrite_ok=True)
print(model.adapter_summary())
##### print result
================================================================================
Name                     Architecture         #Param      %Param  Active   Train
--------------------------------------------------------------------------------
en                       bottleneck                0       0.000       0       1
de                       bottleneck                0       0.000       0       1
--------------------------------------------------------------------------------
Full model                               247,363,386     100.000               1
================================================================================

When only leaving out the first 6 layers of the encoder, we see adapters are only added to the encoder, leaving the decoder.

### leave_out first 6 layers of encoder
adapter_config = AdapterConfig(mh_adapter=True, output_adapter=True, reduction_factor=4, non_linearity="gelu", leave_out=list(range(6)))
model.add_adapter("en", adapter_config, overwrite_ok=True)
model.add_adapter("de", adapter_config, overwrite_ok=True)
print(model.adapter_summary())

##### print result
================================================================================
Name                     Architecture         #Param      %Param  Active   Train
--------------------------------------------------------------------------------
en                       bottleneck        3,550,464       1.435       0       1
de                       bottleneck        3,550,464       1.435       0       1
--------------------------------------------------------------------------------
Full model                               247,363,386     100.000               1
================================================================================

#### check parameter
##### print result
print([name for name, p in model.named_parameters() if "adapter" in name])
['encoder.encoder.layer.6.attention.output.adapters.en.adapter_down.0.weight',
 'encoder.encoder.layer.6.attention.output.adapters.en.adapter_down.0.bias',
 'encoder.encoder.layer.6.attention.output.adapters.en.adapter_up.weight',
 'encoder.encoder.layer.6.attention.output.adapters.en.adapter_up.bias',
 'encoder.encoder.layer.6.attention.output.adapters.de.adapter_down.0.weight',
 'encoder.encoder.layer.6.attention.output.adapters.de.adapter_down.0.bias',
 'encoder.encoder.layer.6.attention.output.adapters.de.adapter_up.weight',
 'encoder.encoder.layer.6.attention.output.adapters.de.adapter_up.bias',
 'encoder.encoder.layer.6.output.adapters.en.adapter_down.0.weight',
 'encoder.encoder.layer.6.output.adapters.en.adapter_down.0.bias',
 'encoder.encoder.layer.6.output.adapters.en.adapter_up.weight',
 'encoder.encoder.layer.6.output.adapters.en.adapter_up.bias',
 'encoder.encoder.layer.6.output.adapters.de.adapter_down.0.weight',
 'encoder.encoder.layer.6.output.adapters.de.adapter_down.0.bias',
 'encoder.encoder.layer.6.output.adapters.de.adapter_up.weight',
 'encoder.encoder.layer.6.output.adapters.de.adapter_up.bias',
 'encoder.encoder.layer.7.attention.output.adapters.en.adapter_down.0.weight',
 'encoder.encoder.layer.7.attention.output.adapters.en.adapter_down.0.bias',
 'encoder.encoder.layer.7.attention.output.adapters.en.adapter_up.weight',
 'encoder.encoder.layer.7.attention.output.adapters.en.adapter_up.bias',
 'encoder.encoder.layer.7.attention.output.adapters.de.adapter_down.0.weight',
 'encoder.encoder.layer.7.attention.output.adapters.de.adapter_down.0.bias',
 'encoder.encoder.layer.7.attention.output.adapters.de.adapter_up.weight',
 'encoder.encoder.layer.7.attention.output.adapters.de.adapter_up.bias',
 'encoder.encoder.layer.7.output.adapters.en.adapter_down.0.weight',
 'encoder.encoder.layer.7.output.adapters.en.adapter_down.0.bias',
 'encoder.encoder.layer.7.output.adapters.en.adapter_up.weight',
 'encoder.encoder.layer.7.output.adapters.en.adapter_up.bias',
 'encoder.encoder.layer.7.output.adapters.de.adapter_down.0.weight',
 'encoder.encoder.layer.7.output.adapters.de.adapter_down.0.bias',
 'encoder.encoder.layer.7.output.adapters.de.adapter_up.weight',
 'encoder.encoder.layer.7.output.adapters.de.adapter_up.bias',
 'encoder.encoder.layer.8.attention.output.adapters.en.adapter_down.0.weight',
 'encoder.encoder.layer.8.attention.output.adapters.en.adapter_down.0.bias',
 'encoder.encoder.layer.8.attention.output.adapters.en.adapter_up.weight',
 'encoder.encoder.layer.8.attention.output.adapters.en.adapter_up.bias',
 'encoder.encoder.layer.8.attention.output.adapters.de.adapter_down.0.weight',
 'encoder.encoder.layer.8.attention.output.adapters.de.adapter_down.0.bias',
 'encoder.encoder.layer.8.attention.output.adapters.de.adapter_up.weight',
 'encoder.encoder.layer.8.attention.output.adapters.de.adapter_up.bias',
 'encoder.encoder.layer.8.output.adapters.en.adapter_down.0.weight',
 'encoder.encoder.layer.8.output.adapters.en.adapter_down.0.bias',
 'encoder.encoder.layer.8.output.adapters.en.adapter_up.weight',
 'encoder.encoder.layer.8.output.adapters.en.adapter_up.bias',
 'encoder.encoder.layer.8.output.adapters.de.adapter_down.0.weight',
 'encoder.encoder.layer.8.output.adapters.de.adapter_down.0.bias',
 'encoder.encoder.layer.8.output.adapters.de.adapter_up.weight',
 'encoder.encoder.layer.8.output.adapters.de.adapter_up.bias',
 'encoder.encoder.layer.9.attention.output.adapters.en.adapter_down.0.weight',
 'encoder.encoder.layer.9.attention.output.adapters.en.adapter_down.0.bias',
 'encoder.encoder.layer.9.attention.output.adapters.en.adapter_up.weight',
 'encoder.encoder.layer.9.attention.output.adapters.en.adapter_up.bias',
 'encoder.encoder.layer.9.attention.output.adapters.de.adapter_down.0.weight',
 'encoder.encoder.layer.9.attention.output.adapters.de.adapter_down.0.bias',
 'encoder.encoder.layer.9.attention.output.adapters.de.adapter_up.weight',
 'encoder.encoder.layer.9.attention.output.adapters.de.adapter_up.bias',
 'encoder.encoder.layer.9.output.adapters.en.adapter_down.0.weight',
 'encoder.encoder.layer.9.output.adapters.en.adapter_down.0.bias',
 'encoder.encoder.layer.9.output.adapters.en.adapter_up.weight',
 'encoder.encoder.layer.9.output.adapters.en.adapter_up.bias',
 'encoder.encoder.layer.9.output.adapters.de.adapter_down.0.weight',
 'encoder.encoder.layer.9.output.adapters.de.adapter_down.0.bias',
 'encoder.encoder.layer.9.output.adapters.de.adapter_up.weight',
 'encoder.encoder.layer.9.output.adapters.de.adapter_up.bias',
 'encoder.encoder.layer.10.attention.output.adapters.en.adapter_down.0.weight',
 'encoder.encoder.layer.10.attention.output.adapters.en.adapter_down.0.bias',
 'encoder.encoder.layer.10.attention.output.adapters.en.adapter_up.weight',
 'encoder.encoder.layer.10.attention.output.adapters.en.adapter_up.bias',
 'encoder.encoder.layer.10.attention.output.adapters.de.adapter_down.0.weight',
 'encoder.encoder.layer.10.attention.output.adapters.de.adapter_down.0.bias',
 'encoder.encoder.layer.10.attention.output.adapters.de.adapter_up.weight',
 'encoder.encoder.layer.10.attention.output.adapters.de.adapter_up.bias',
 'encoder.encoder.layer.10.output.adapters.en.adapter_down.0.weight',
 'encoder.encoder.layer.10.output.adapters.en.adapter_down.0.bias',
 'encoder.encoder.layer.10.output.adapters.en.adapter_up.weight',
 'encoder.encoder.layer.10.output.adapters.en.adapter_up.bias',
 'encoder.encoder.layer.10.output.adapters.de.adapter_down.0.weight',
 'encoder.encoder.layer.10.output.adapters.de.adapter_down.0.bias',
 'encoder.encoder.layer.10.output.adapters.de.adapter_up.weight',
 'encoder.encoder.layer.10.output.adapters.de.adapter_up.bias',
 'encoder.encoder.layer.11.attention.output.adapters.en.adapter_down.0.weight',
 'encoder.encoder.layer.11.attention.output.adapters.en.adapter_down.0.bias',
 'encoder.encoder.layer.11.attention.output.adapters.en.adapter_up.weight',
 'encoder.encoder.layer.11.attention.output.adapters.en.adapter_up.bias',
 'encoder.encoder.layer.11.attention.output.adapters.de.adapter_down.0.weight',
 'encoder.encoder.layer.11.attention.output.adapters.de.adapter_down.0.bias',
 'encoder.encoder.layer.11.attention.output.adapters.de.adapter_up.weight',
 'encoder.encoder.layer.11.attention.output.adapters.de.adapter_up.bias',
 'encoder.encoder.layer.11.output.adapters.en.adapter_down.0.weight',
 'encoder.encoder.layer.11.output.adapters.en.adapter_down.0.bias',
 'encoder.encoder.layer.11.output.adapters.en.adapter_up.weight',
 'encoder.encoder.layer.11.output.adapters.en.adapter_up.bias',
 'encoder.encoder.layer.11.output.adapters.de.adapter_down.0.weight',
 'encoder.encoder.layer.11.output.adapters.de.adapter_down.0.bias',
 'encoder.encoder.layer.11.output.adapters.de.adapter_up.weight',
 'encoder.encoder.layer.11.output.adapters.de.adapter_up.bias']

Expected behavior

The EncoderDecoderModel class work like BART-like models.

ZeguanXiao avatar Jan 11 '23 06:01 ZeguanXiao

Also, it seems EncoderDecoderModelAdaptersMixin.iter_layers should count decoder layer_id starting with len(self.encoder.layers)) like this?

    def iter_layers(self) -> Iterable[Tuple[int, nn.Module]]:
        for i, layer in self.encoder.iter_layers():
            yield i, layer

        encoder_layer_n = len(self.encoder.encoder.layer)
        for i, layer in self.decoder.iter_layers():
            yield i + encoder_layer_n, layer

ZeguanXiao avatar Jan 11 '23 17:01 ZeguanXiao

Hey @ZeguanXiao, I see why this is unexpected behavior. Unfortunately it is not as easy as changing the iter_layer indices. I will look into this.

hSterz avatar Jan 13 '23 11:01 hSterz

@hSterz My current workaround is setting model.decoder.base_model.config.adapters = model.encoder.base_model.config.adapters and changing the iter_layer. It seems to work fine.

ZeguanXiao avatar Jan 13 '23 12:01 ZeguanXiao

This issue has been automatically marked as stale because it has been without activity for 90 days. This issue will be closed in 14 days unless you comment or remove the stale label.

adapter-hub-bert avatar Apr 14 '23 06:04 adapter-hub-bert