obs-localvocal icon indicating copy to clipboard operation
obs-localvocal copied to clipboard

Request: Translation Output Language field to handle Traditional Chinese

Open Francoyy opened this issue 1 year ago • 2 comments

Chinese is a special language, because it has two ways of displaying characters: Simplified and Traditional. China is using Simplified Chinese, and Traditional Chinese is being used in Hong Kong and Taiwan in particular.

More info in the language codes here https://cloud.google.com/translate/docs/languages The code for Simplified Chinese is "zh-CN" The code for Traditional Chinese is "zh-TW"

The dropdown menu in the Local Vocal filters only shows "Chinese" which maps into Simplified Chinese. It would be great to have two options instead: "Chinese (Simplified)" and "Chinese (Traditional)" similar to what we can see on Google Translate and most of the other translation engines.

With that setting missing, it is impossible to get Traditional Chinese translations, even though the different translation providers support it. My only alternative at this point would be to write a script to convert Simplified into Traditional in the text file output, but this is not as simple as that, because it is not a 1-1 mapping between simplified and traditional characters. It depends on context, so a translation engine would do it much better. Thanks!

Francoyy avatar Jan 07 '25 04:01 Francoyy

Edit 2: Writing something in the "Initial prompt" setting in your desired script (either Simplified or Traditional) can actually force the translation to be in the desired script if the "Model" is "Whisper-Based Translation".

Edit: Just realized that your issue was with the translation, not the transcription, and that you're an active participant on the OpenAI discussion thread. Sorry for the irrelevant response, good luck with finding a solution to your problem!

Whisper ASR accepts prompts just like an LLM, and prompting it to use either Simplified or Traditional will get you your desired output. See the link below for the solution to your problem.

https://github.com/openai/whisper/discussions/277#discussioncomment-3832154

In my case, I had the opposite issue that you had. I wanted Simplified but kept getting Traditional and prompting Whisper properly resolved my issue.

  1. For the LocalVocal Transcription audio filter, switch "Mode" from "Simple" to "Advanced" to expose more settings.
  2. Switch "Input Language" to "Auto detect". Alternatively, you can keep it as "Chinese" if you want to make sure it only outputs Chinese and not other languages.
  3. Scroll all the way down to find a setting called "Initial prompt". Write something in your desired script/character. Since I wanted Simplified, I wrote 俺们爱用简体字 in the box.
  4. Subtitles should be in your desired script/character now.

This solution has been tested on Whisper Tiny q5 and Whisper Medium q5, and it should work on other Whisper models too.

huhwhat avatar Jul 25 '25 21:07 huhwhat

At the moment the list of languages is populated using the Whisper language codes, and whisper only uses the simple, 2-letter language codes so has no distinction between language variants like traditional and simplified Chinese, likely because its training data was a mix of variants lumped together

This is definitely doable but it'll need another separate list adding with the longer character codes supported by the translation engines. It's on my todo list, but it's going to have to wait a while

That said I might be able to just add the ability to enter a custom language code instead of choosing from the list

Tabby avatar Oct 31 '25 02:10 Tabby