transformers icon indicating copy to clipboard operation
transformers copied to clipboard

whisper identified the wrong language

Open LYPinASR opened this issue 2 years ago • 4 comments

Feature request

When I follow the example of long-form transcription for whisper-large with Korean, the result is English. But after finetuning the whisper-large model with some Korean data, the checkpoint can output Korean. I also test other model size, but all the models output English. I was confused about it. How should I do to output Korean with the original model? Thank you!

Motivation

Test whisper in Korean.

Your contribution

Test whisper in Korean.

LYPinASR avatar Apr 26 '23 14:04 LYPinASR

Hi there. Questions like this are better suited on the forums or a discussion on the model page as we keep issues for bugs and feature requests only.

sgugger avatar Apr 26 '23 14:04 sgugger

If you use pipeline, you should add option like generate_kwargs = {"task":"transcribe", "language":"<|fr|>"}

ref1: https://colab.research.google.com/drive/1rS1L4YSJqKUH_3YxIQHBI982zso23wor#scrollTo=dPD20IkEDsbG ref2: https://github.com/huggingface/transformers/issues/22331

however, I think default task should be "transcribe" not "translate". I insist It's an error.

chenht2021 avatar Apr 27 '23 09:04 chenht2021

I have solved the problem. Step 1: Upgrade transformers, unfixed. Step 2: Add option like "generate_kwargs = {"task":"transcribe", "language":"<|fr|>"}", unfixed. Step 3: Add a line like "pipe.model.config.forced_decoder_ids = pipe.tokenizer.get_decoder_prompt_ids(language="ko", task="transcribe")", fixed.

However, I still don't understand why the original model output is English but the fine-tuned model output is in Korean.

LYPinASR avatar Apr 27 '23 09:04 LYPinASR

maybe you can checked your fine-tuned model's config.json or generation_config.json, double check the default task type, I think it's null or "transcribe"

chenht2021 avatar Apr 27 '23 09:04 chenht2021

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

github-actions[bot] avatar May 26 '23 15:05 github-actions[bot]