subby
subby copied to clipboard
Advanced subtitle converter and processor
Some subtitles use different styles for different speakers, those could be converted to dashes. ```xml Someone get that, will you? Liam's turn. Are you gonna do it again before school?...
Some subtitles contain elements, which have differently timed lines within one \. Those should probably be treated as two separate lines. Current parser just skips them, which is suboptimal. Might...
I bump into this issue. 3rd new line , double hyphen and unnecessary hyphen. Original: ``` 1 00:05:26,800 --> 00:05:31,200 - Öhm... Bocsánat... Mr.Teufel... de a... - Mi a baj,...
Currently `CommonIssuesFixer` is pretty large in scope. It would be good to break it up. Another useful feature would be per-language processing, for example spaces before question marks are always...
Some SDH subtitles prefix one sentence with a speaker name, and the second one with nothing (or both with a speaker name). After stripping, those look like a single sentence,...
BeautifulSoup4 is currently used for TTML subtitle conversion, and while it handles this job well, it's certainly not the fastest option.
Some providers use terrible flowing subtitles, probably converted from NA broadcast captions. Previous attempt was highly imperfect and corrupted some otherwise good subs, so it has been removed in 94e2b96c....
This would be a nice to have, but it's error-prone. Would require a dictionary, similar to SubtitleEdit, as well as a list of user-provided names, which could be specific to...
Currently tests are pretty rudimentary and only cover some of the processing. One thing which would particularly benefit from them is SDH stripping.