TeroSubtitler icon indicating copy to clipboard operation
TeroSubtitler copied to clipboard

Sub-optimal presentation of auto-transcribed subtitles

Open polsola opened this issue 1 year ago • 5 comments

Hello, thanks for this great work, Tero Subtitler has become one of our most used tools, it's great

I've just noticed a bug, but only on certain videos. When Tero Subtitler would transcribe audio in a video correctly but will generate really long span subtitles, I mean like 20 seconds of text on the same subtitle span

The weird thing is we have a series of videos with the same voice and format, in some cases this happens, and in some others not, I've checked and all have same resolution, FPS, etc

All the videos start with like 5-8 seconds of silence, I don't know if that would affect the result

tero

I've checked, and the settings for subtitle duration is the default at 7000ms

polsola avatar Feb 12 '24 15:02 polsola

Hola, thank you for your words.

I think you are using macOS, you can try the testbuild "https://github.com/URUWorks/additional-files/raw/main/terosubtitler_testbuild_macOS64.zip" where you can choose the maximum of the line.

Another alternative is to use the function to divide, from the "Edit/Entries/Divide entry" menu. I hope it helps you.

URUWorks avatar Feb 12 '24 17:02 URUWorks

Thanks! The option to set a maximum of line is great, It would be awesome if it would avoid cutting words, but that's a great fix for now

polsola avatar Feb 14 '24 14:02 polsola

I think a superior algorithm is needed (to match other tools/services — some of which might not be quite there either, incidentally — including EZTitles and Happy Scribe) to account for the following (possibly more):

  • Linguistic units [not splitting them]
  • Maximum display time [not exceeding seven seconds]
  • Characters per line (CPL) count [not exceeding 42]
  • Characters per second (CPS) count [not exceeding 20]
  • Shot changes [assuming there is a list imported]
  • Gapping [two frames between entries]

It could be that some breaches are inevitable and will require user intervention after processing, but some simple improvements would be desirable.

See this.

chenlung avatar Feb 14 '24 14:02 chenlung

Requested elsewhere: #323

chenlung avatar Sep 02 '24 13:09 chenlung

some guidelines to designe future whisper auto transcribed subtitling structure:

The whisper voice-to-text transcripcion is really excellent. Word level detection is almost perfect. Timing is good to. But there is no reader-perspective in lines division. Human made subtitles contain some considerations in line/subtitle divisions

I would like Tero´s functionality of Voice-to-Text transcription, to consider in the creation of subtitles, some considerations regardiong the division of lines and subtitles, besides not exceeding x number of characters (which it effectebly does)

These considerations could be instructed to the transcriptor: a) Silences and interpretative pauses (mybe + 0,5 seconds of no words) should be the place to divide lines or subtitles (regardless the length of the line/subtitle) b) grammatical pauses and punctuation marks should be the place to divide lines or subtitles (regardless the length of the line/subtitle) c) Write conjunctions and connections on the bottom line. (for example "and" "or", etc.) d) Do not separate nominal, verbal and prepositional phrases into lines (for example: imagen

Copio la norma UNE 153010 que es útil para comprender y implementar esto. UNE_153010_2012.pdf https://github.com/user-attachments/files/16770767/UNE_153010_2012.pdf It would be great for the transcriptor to identify actors, utilizing diarization functionality of whisper. Great work!! Thanks! PS: I offer myself as subtitling consultant, as I am working with Tero regularly, with industry standards.

cuentacuentoscaminantes avatar Sep 23 '24 18:09 cuentacuentoscaminantes