seamless_communication icon indicating copy to clipboard operation
seamless_communication copied to clipboard

Do ASR must specify the parameter “tgt_lang” ? (ASR 必须要指定tgt_lang这个参数吗)

Open lilongwei5054 opened this issue 1 year ago • 1 comments

import torchaudio from transformers import AutoProcessor, SeamlessM4Tv2Model processor = AutoProcessor.from_pretrained("facebook/seamless-m4t-v2-large") model = SeamlessM4Tv2Model.from_pretrained("facebook/seamless-m4t-v2-large")

fileName="asr.wav" audio, orig_freq = torchaudio.load(fileName) audio = torchaudio.functional.resample(audio, orig_freq=orig_freq, new_freq=16000) audio_inputs = processor(audios=audio, return_tensors="pt") output_tokens = model.generate(audio_inputs, tgt_lang="cmn", generate_speech=False) translated_text_from_audio = processor.decode(output_tokens[0].tolist()[0], skip_special_tokens=True) //ASR result : 今天天气真不错 When I set tgt_lang="cmn",the result is correct.The original audio is in Chinese. But when I set tgt_lang=None,the ASR result is "The weather is really nice today".It has been translated into English!

I think it could have automatically determined the language in the audio

lilongwei5054 avatar Dec 25 '23 03:12 lilongwei5054

ASR with Seamless is treated as a special case of translation, where the source and target languages are the same. But the Seamless models were not trained to predict the target language on their own, so it is your responsibility to provide the right tgt_lang tag.

avidale avatar Mar 14 '24 15:03 avidale