seamless_communication
seamless_communication copied to clipboard
Do ASR must specify the parameter “tgt_lang” ? (ASR 必须要指定tgt_lang这个参数吗)
import torchaudio from transformers import AutoProcessor, SeamlessM4Tv2Model processor = AutoProcessor.from_pretrained("facebook/seamless-m4t-v2-large") model = SeamlessM4Tv2Model.from_pretrained("facebook/seamless-m4t-v2-large")
fileName="asr.wav" audio, orig_freq = torchaudio.load(fileName) audio = torchaudio.functional.resample(audio, orig_freq=orig_freq, new_freq=16000) audio_inputs = processor(audios=audio, return_tensors="pt") output_tokens = model.generate(audio_inputs, tgt_lang="cmn", generate_speech=False) translated_text_from_audio = processor.decode(output_tokens[0].tolist()[0], skip_special_tokens=True) //ASR result : 今天天气真不错 When I set tgt_lang="cmn",the result is correct.The original audio is in Chinese. But when I set tgt_lang=None,the ASR result is "The weather is really nice today".It has been translated into English!
I think it could have automatically determined the language in the audio
ASR with Seamless is treated as a special case of translation, where the source and target languages are the same.
But the Seamless models were not trained to predict the target language on their own, so it is your responsibility to provide the right tgt_lang
tag.