PeerTube
PeerTube copied to clipboard
Transcription : use video language if provided instead of auto detection
Describe the current behavior
Misdetection on transcription language of this video https://vhsky.cz/w/ajtLAaxCKVH5YgyRJouzYH - instead of transcripting to English it tries to translate into Czech.
Research we did so far:
- Check video language in Peertube - correct
- Try to re-generate transcription in case there was wrong language set on upload - still translating to Czech
- Check various stuff on instance if it's not forcing Czech language somehow - did not find anything, even main instance language is set to English
It seems to me that WhisperAI is guessing language and is not success with it.
Steps to reproduce
- Try to generate English captions to this video
- Captions are "somehow" translated into Czech instead of just transcribing
Describe the expected behavior
WhisperAI should take language set in PeerTube video details if available before trying to "guess" the language.
Additional information
- PeerTube instance:
- URL: https://vhsky.cz
- Version: 7.3.0
- Transcription engine: CTranslate2
- Model: Large-v3