seamless_communication icon indicating copy to clipboard operation
seamless_communication copied to clipboard

When use speech to text inference, how to keep the src_lang same as tgt_lang

Open angrysword opened this issue 1 year ago • 1 comments

Many real world speed may include two or more language, like the people who speak Japanese, may some words have to use English. when we do transcribe, we like to keep the original text. how to do that?

even with asr, we still need put src_lang,

ASR This is equivalent to S2TT with <tgt_lang>=<src_lang>. transcribed_text, _, _ = translator.predict(<path_to_input_audio>, "asr", <src_lang>)

angrysword avatar Oct 03 '23 23:10 angrysword

I would like to know this as well. How can we set the target language to source language in M4Tv2? For Audio you often dont know the language. Is there a LID already integrated in M4Tv2 or do I need to do this beforehand?

asusdisciple avatar Dec 05 '23 10:12 asusdisciple

No, spoken language identification is not integrated in Seamless models. If you want it, you'll have to apply some external LID model, such as MMS. I provide more details in #325.

avidale avatar Mar 14 '24 15:03 avidale