speechbrain Generic adapters implementation

Closes #2526, #2534

Here is a proposal for how we can add adapters (including LoRA) to the toolkit. This branch is based on #2534 - and it also implements flexible layer selection and small checkpoints.

There's a few more things that would be nice to have but I personally don't think they're necessary before merge.

a merge_and_unload() type function for LoRA-type layers that reintegrates the adapter weights to the original model
the capability to use adapters from peft library -- they have an extensive collection that will probably update regularly
more adapter types

If anyone thinks these are urgent we can work on adding them to this PR.

Jun 05 '24 01:06 pplantinga

Hey Peter, I would think that using peft could be a nice add to this PR. Not critical, but a nice add! I will have a look at the code, thanks for the work!

edit: I looked the code and like it, see minor comments. @poonehmousavi may want to try it, to say how it goes. We should get the input of @Adel-Moumen and @asumagic as well, just as a matter of wisdom.

Jun 05 '24 08:06 TParcollet

I tried this recipe with a peft layer and it just worked to my amazement. Here's the exact change I made:

 whisper: !new:speechbrain.nnet.adapters.AdaptedModel
     model_to_adapt: !ref <whisper_pretrained>
-    adapter_class: !name:speechbrain.nnet.adapters.LoRA
-    rank: !ref <lora_rank>
+    adapter_class: !name:peft.tuners.lora.layer.Linear
     target_layers: [all-linear]
+    r: !ref <lora_rank>
+    adapter_name: lora

Jun 05 '24 22:06 pplantinga

@pplantinga are the checkpointing features working as well with this easy peft adaptation? We should make sure it works with Pretrainer also, not just checkpointing I blieve.

Jun 06 '24 13:06 TParcollet

@Adel-Moumen @mravanelli I think we will want this in v1.0.1 And it looks ready to me?

Jul 08 '24 13:07 TParcollet

@poonehmousavi could you review and test the code as mentioned? It looks ready to me. Thanks!

Jul 08 '24 13:07 TParcollet

@poonehmousavi could you review and test the code as mentioned? It looks ready to me. Thanks!

Sure. I will do it by tomorrow.

Jul 08 '24 13:07 poonehmousavi

@pplantinga have you tested it with pretrainer using for interfaces? also have you checked how it works with quantization?(like QLORA)

Jul 10 '24 00:07 poonehmousavi

@pplantinga have you tested it with pretrainer using for interfaces?

I tested this and it worked, but had warnings due to loading only trained params. I have fixed this now.

The yaml I used is here:

whisper_hub: openai/whisper-small.en
lora_rank: 16
language: "english"
sample_rate: 16000

min_decode_ratio: 0.0
max_decode_ratio: 1.0
test_beam_size: 8

whisper_pretrained: !new:speechbrain.lobes.models.huggingface_transformers.whisper.Whisper
    source: !ref <whisper_hub>
    save_path: .
    language: !ref <language>
    task: "transcribe"
    sampling_rate: !ref <sample_rate>

whisper: !new:speechbrain.nnet.adapters.AdaptedModel
    model_to_adapt: !ref <whisper_pretrained>
    adapter_class: !name:speechbrain.nnet.adapters.LoRA
    all_linear: True
    adapter_kwargs:
        rank: !ref <lora_rank>

test_search: !new:speechbrain.decoders.seq2seq.S2SWhisperBeamSearcher
    module: [!ref <whisper>]
    min_decode_ratio: !ref <min_decode_ratio>
    max_decode_ratio: !ref <max_decode_ratio>
    beam_size: !ref <test_beam_size>

modules:
    whisper: !ref <whisper>
    decoder: !ref <test_search>

pretrainer: !new:speechbrain.utils.parameter_transfer.Pretrainer
    loadables:
        whisper: !ref <whisper>

And python:

model = sb.inference.ASR.WhisperASR.from_hparams(".", "lora_pre.yaml", savedir="results/whisper/1987/save/CKPT+2024-06-05+18-30-33+00")
model.transcribe_file("speechbrain/asr-streaming-conformer-librispeech/test-en.wav")

also have you checked how it works with quantization?(like QLORA)

I am not very familiar with QLoRA, it seems there's additional setup needed to get this to work.

Jul 13 '24 17:07 pplantinga

One epoch (100h) results for Whisper Small.en, published results are test-clean=3.05 and test-other=7.53:

speechbrain.utils.train_logger - Epoch loaded: 1 - test loss: 9.73e-01, test CER: 1.03, test WER: 2.81
speechbrain.utils.train_logger - Epoch loaded: 1 - test loss: 9.86e-01, test CER: 1.08, test WER: 2.90
speechbrain.utils.train_logger - Epoch loaded: 1 - test loss: 1.22, test CER: 3.00, test WER: 6.57

Jul 20 '24 18:07 pplantinga

@pplantinga Should we merge this maybe? Maybe with a small tutorial somewhere as well?

Sep 04 '24 07:09 TParcollet

There are more features that could be added but I think this is ready for merge as-is and the rest can be added later.

Sep 04 '24 12:09 pplantinga

speechbrain speechbrain copied to clipboard

Generic adapters implementation

speechbrain
speechbrain copied to clipboard