transformers icon indicating copy to clipboard operation
transformers copied to clipboard

past_key_values for SeamlessM4Tv2ForSpeechToText is not working as expected

Open vapemaster-kz opened this issue 6 months ago • 3 comments

System Info

transformers version: 4.37.2 python verison: 3.8.6. OS: Windows 11

Who can help?

@sanchit-gandhi

Information

  • [ ] The official example scripts
  • [X] My own modified scripts

Tasks

  • [ ] An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • [X] My own task or dataset (give details below)

Reproduction

I have segments of audio, and I would like to pass past_key_values between them. I exptected the transcribtion quality to increase, but rather it became unreadable.


processor = AutoProcessor.from_pretrained(path_to_model)
model = SeamlessM4Tv2ForSpeechToText.from_pretrained(path_to_model)
audio_chunks = [audio_segments]

past_key_values= None

for i in range(5):
    audio_inputs = processor(audios=audio_chunks[i], return_tensors="pt", sampling_rate=16_000)

    output = model.generate(**audio_inputs, tgt_lang="rus", repetition_penalty=1.1, return_dict_in_generate=True, past_key_values=past_key_values)
    
    tmp_result = processor.decode(output[0][0], skip_special_tokens=True)
    past_key_values = output['past_key_values']

Expected behavior

The transcription quality is supposed to increase when I pass past_key_values (or at least have similar transcribation quality as when past_key_values=None). The audio is the same. In other words, I had some audio, applied VAD to segment it into batches, and then fed this segments to the model one by one.

vapemaster-kz avatar Feb 20 '24 11:02 vapemaster-kz

cc @ylacombe

amyeroberts avatar Feb 20 '24 11:02 amyeroberts

Gentle ping @sanchit-gandhi @ylacombe

amyeroberts avatar Apr 16 '24 18:04 amyeroberts

Another ping @sanchit-gandhi @ylacombe

amyeroberts avatar May 13 '24 09:05 amyeroberts

Hey @vapemaster-kz, I believe you're misunderstanding what past-key values are used for!

You can find on the docs here that past-key values can be used to speed up sequential decoding.

Past key values are simply a way to avoid re-computing the same information over and over again when calling generate from scratch on the same generation pass.

I hope it helps!

ylacombe avatar May 13 '24 16:05 ylacombe