speechbrain icon indicating copy to clipboard operation
speechbrain copied to clipboard

Generic adapters implementation

Open pplantinga opened this issue 1 year ago • 9 comments

Closes #2526, #2534

Here is a proposal for how we can add adapters (including LoRA) to the toolkit. This branch is based on #2534 - and it also implements flexible layer selection and small checkpoints.

There's a few more things that would be nice to have but I personally don't think they're necessary before merge.

  • a merge_and_unload() type function for LoRA-type layers that reintegrates the adapter weights to the original model
  • the capability to use adapters from peft library -- they have an extensive collection that will probably update regularly
  • more adapter types

If anyone thinks these are urgent we can work on adding them to this PR.

pplantinga avatar Jun 05 '24 01:06 pplantinga

Hey Peter, I would think that using peft could be a nice add to this PR. Not critical, but a nice add! I will have a look at the code, thanks for the work!

edit: I looked the code and like it, see minor comments. @poonehmousavi may want to try it, to say how it goes. We should get the input of @Adel-Moumen and @asumagic as well, just as a matter of wisdom.

TParcollet avatar Jun 05 '24 08:06 TParcollet

I tried this recipe with a peft layer and it just worked to my amazement. Here's the exact change I made:

 whisper: !new:speechbrain.nnet.adapters.AdaptedModel
     model_to_adapt: !ref <whisper_pretrained>
-    adapter_class: !name:speechbrain.nnet.adapters.LoRA
-    rank: !ref <lora_rank>
+    adapter_class: !name:peft.tuners.lora.layer.Linear
     target_layers: [all-linear]
+    r: !ref <lora_rank>
+    adapter_name: lora
 

pplantinga avatar Jun 05 '24 22:06 pplantinga

@pplantinga are the checkpointing features working as well with this easy peft adaptation? We should make sure it works with Pretrainer also, not just checkpointing I blieve.

TParcollet avatar Jun 06 '24 13:06 TParcollet

@Adel-Moumen @mravanelli I think we will want this in v1.0.1 And it looks ready to me?

TParcollet avatar Jul 08 '24 13:07 TParcollet

@poonehmousavi could you review and test the code as mentioned? It looks ready to me. Thanks!

TParcollet avatar Jul 08 '24 13:07 TParcollet

@poonehmousavi could you review and test the code as mentioned? It looks ready to me. Thanks!

Sure. I will do it by tomorrow.

poonehmousavi avatar Jul 08 '24 13:07 poonehmousavi

@pplantinga have you tested it with pretrainer using for interfaces? also have you checked how it works with quantization?(like QLORA)

poonehmousavi avatar Jul 10 '24 00:07 poonehmousavi

@pplantinga have you tested it with pretrainer using for interfaces?

I tested this and it worked, but had warnings due to loading only trained params. I have fixed this now.

The yaml I used is here:

whisper_hub: openai/whisper-small.en
lora_rank: 16
language: "english"
sample_rate: 16000

min_decode_ratio: 0.0
max_decode_ratio: 1.0
test_beam_size: 8

whisper_pretrained: !new:speechbrain.lobes.models.huggingface_transformers.whisper.Whisper
    source: !ref <whisper_hub>
    save_path: .
    language: !ref <language>
    task: "transcribe"
    sampling_rate: !ref <sample_rate>

whisper: !new:speechbrain.nnet.adapters.AdaptedModel
    model_to_adapt: !ref <whisper_pretrained>
    adapter_class: !name:speechbrain.nnet.adapters.LoRA
    all_linear: True
    adapter_kwargs:
        rank: !ref <lora_rank>

test_search: !new:speechbrain.decoders.seq2seq.S2SWhisperBeamSearcher
    module: [!ref <whisper>]
    min_decode_ratio: !ref <min_decode_ratio>
    max_decode_ratio: !ref <max_decode_ratio>
    beam_size: !ref <test_beam_size>

modules:
    whisper: !ref <whisper>
    decoder: !ref <test_search>

pretrainer: !new:speechbrain.utils.parameter_transfer.Pretrainer
    loadables:
        whisper: !ref <whisper>

And python:

model = sb.inference.ASR.WhisperASR.from_hparams(".", "lora_pre.yaml", savedir="results/whisper/1987/save/CKPT+2024-06-05+18-30-33+00")
model.transcribe_file("speechbrain/asr-streaming-conformer-librispeech/test-en.wav")

also have you checked how it works with quantization?(like QLORA)

I am not very familiar with QLoRA, it seems there's additional setup needed to get this to work.

pplantinga avatar Jul 13 '24 17:07 pplantinga

One epoch (100h) results for Whisper Small.en, published results are test-clean=3.05 and test-other=7.53:

speechbrain.utils.train_logger - Epoch loaded: 1 - test loss: 9.73e-01, test CER: 1.03, test WER: 2.81
speechbrain.utils.train_logger - Epoch loaded: 1 - test loss: 9.86e-01, test CER: 1.08, test WER: 2.90
speechbrain.utils.train_logger - Epoch loaded: 1 - test loss: 1.22, test CER: 3.00, test WER: 6.57

pplantinga avatar Jul 20 '24 18:07 pplantinga

@pplantinga Should we merge this maybe? Maybe with a small tutorial somewhere as well?

TParcollet avatar Sep 04 '24 07:09 TParcollet

There are more features that could be added but I think this is ready for merge as-is and the rest can be added later.

pplantinga avatar Sep 04 '24 12:09 pplantinga