speechbrain
speechbrain copied to clipboard
Generic adapters implementation
Closes #2526, #2534
Here is a proposal for how we can add adapters (including LoRA) to the toolkit. This branch is based on #2534 - and it also implements flexible layer selection and small checkpoints.
There's a few more things that would be nice to have but I personally don't think they're necessary before merge.
- a
merge_and_unload()type function for LoRA-type layers that reintegrates the adapter weights to the original model - the capability to use adapters from
peftlibrary -- they have an extensive collection that will probably update regularly - more adapter types
If anyone thinks these are urgent we can work on adding them to this PR.
Hey Peter, I would think that using peft could be a nice add to this PR. Not critical, but a nice add! I will have a look at the code, thanks for the work!
edit: I looked the code and like it, see minor comments. @poonehmousavi may want to try it, to say how it goes. We should get the input of @Adel-Moumen and @asumagic as well, just as a matter of wisdom.
I tried this recipe with a peft layer and it just worked to my amazement. Here's the exact change I made:
whisper: !new:speechbrain.nnet.adapters.AdaptedModel
model_to_adapt: !ref <whisper_pretrained>
- adapter_class: !name:speechbrain.nnet.adapters.LoRA
- rank: !ref <lora_rank>
+ adapter_class: !name:peft.tuners.lora.layer.Linear
target_layers: [all-linear]
+ r: !ref <lora_rank>
+ adapter_name: lora
@pplantinga are the checkpointing features working as well with this easy peft adaptation? We should make sure it works with Pretrainer also, not just checkpointing I blieve.
@Adel-Moumen @mravanelli I think we will want this in v1.0.1 And it looks ready to me?
@poonehmousavi could you review and test the code as mentioned? It looks ready to me. Thanks!
@poonehmousavi could you review and test the code as mentioned? It looks ready to me. Thanks!
Sure. I will do it by tomorrow.
@pplantinga have you tested it with pretrainer using for interfaces? also have you checked how it works with quantization?(like QLORA)
@pplantinga have you tested it with pretrainer using for interfaces?
I tested this and it worked, but had warnings due to loading only trained params. I have fixed this now.
The yaml I used is here:
whisper_hub: openai/whisper-small.en
lora_rank: 16
language: "english"
sample_rate: 16000
min_decode_ratio: 0.0
max_decode_ratio: 1.0
test_beam_size: 8
whisper_pretrained: !new:speechbrain.lobes.models.huggingface_transformers.whisper.Whisper
source: !ref <whisper_hub>
save_path: .
language: !ref <language>
task: "transcribe"
sampling_rate: !ref <sample_rate>
whisper: !new:speechbrain.nnet.adapters.AdaptedModel
model_to_adapt: !ref <whisper_pretrained>
adapter_class: !name:speechbrain.nnet.adapters.LoRA
all_linear: True
adapter_kwargs:
rank: !ref <lora_rank>
test_search: !new:speechbrain.decoders.seq2seq.S2SWhisperBeamSearcher
module: [!ref <whisper>]
min_decode_ratio: !ref <min_decode_ratio>
max_decode_ratio: !ref <max_decode_ratio>
beam_size: !ref <test_beam_size>
modules:
whisper: !ref <whisper>
decoder: !ref <test_search>
pretrainer: !new:speechbrain.utils.parameter_transfer.Pretrainer
loadables:
whisper: !ref <whisper>
And python:
model = sb.inference.ASR.WhisperASR.from_hparams(".", "lora_pre.yaml", savedir="results/whisper/1987/save/CKPT+2024-06-05+18-30-33+00")
model.transcribe_file("speechbrain/asr-streaming-conformer-librispeech/test-en.wav")
also have you checked how it works with quantization?(like QLORA)
I am not very familiar with QLoRA, it seems there's additional setup needed to get this to work.
One epoch (100h) results for Whisper Small.en, published results are test-clean=3.05 and test-other=7.53:
speechbrain.utils.train_logger - Epoch loaded: 1 - test loss: 9.73e-01, test CER: 1.03, test WER: 2.81
speechbrain.utils.train_logger - Epoch loaded: 1 - test loss: 9.86e-01, test CER: 1.08, test WER: 2.90
speechbrain.utils.train_logger - Epoch loaded: 1 - test loss: 1.22, test CER: 3.00, test WER: 6.57
@pplantinga Should we merge this maybe? Maybe with a small tutorial somewhere as well?
There are more features that could be added but I think this is ready for merge as-is and the rest can be added later.