transformers
transformers copied to clipboard
whisper model's default task should be "transcribe"
System Info
transformersversion: 4.27.2- Platform: Linux-3.10.0-1062.9.1.el7.x86_64-x86_64-with-glibc2.17
- Python version: 3.9.16
- Huggingface_hub version: 0.13.2
- PyTorch version (GPU?): 1.12.1 (True)
- Tensorflow version (GPU?): not installed (NA)
- Flax version (CPU?/GPU?/TPU?): not installed (NA)
- Jax version: not installed
- JaxLib version: not installed
- Using GPU in script?:
- Using distributed or parallel set-up in script?:
Who can help?
@sanchit-gandhi @ArthurZucker
Information
- [X] The official example scripts
- [ ] My own modified scripts
Tasks
- [X] An officially supported task in the
examplesfolder (such as GLUE/SQuAD, ...) - [ ] My own task or dataset (give details below)
Reproduction
In transformers v4.26.1, the following script will output right language, some chinese text. It's the right task "transcribe". However, in version 4.27.2, it will output translated english text, which is another task "translate".
Expected behavior
Do ASR, and Output chinese
cc @ArthurZucker and @sanchit-gandhi 🙏
Hey! As you can see here the default (if the generation_config does not have a task set is still transcribe. What changed is the configuration.json see this commit where the default went from transcribe ( 50358) to translate (50359 in the forced_decoder_ids). The update in transformers just makes sure to properly use this, while the previous version did not take it into account.
This is more a fix than a breaking change IMO
Thank you for your explanation.