seamless_communication
seamless_communication copied to clipboard
When use speech to text inference, how to keep the src_lang same as tgt_lang
Many real world speed may include two or more language, like the people who speak Japanese, may some words have to use English. when we do transcribe, we like to keep the original text. how to do that?
even with asr, we still need put src_lang,
ASR
This is equivalent to S2TT with <tgt_lang>=<src_lang>
.
transcribed_text, _, _ = translator.predict(<path_to_input_audio>, "asr", <src_lang>)
I would like to know this as well. How can we set the target language to source language in M4Tv2? For Audio you often dont know the language. Is there a LID already integrated in M4Tv2 or do I need to do this beforehand?
No, spoken language identification is not integrated in Seamless models. If you want it, you'll have to apply some external LID model, such as MMS. I provide more details in #325.