subliminal icon indicating copy to clipboard operation
subliminal copied to clipboard

Add support for WebVTT and MicroDVD

Open Diaoul opened this issue 10 years ago • 3 comments

~~Will require to switch to pycaption for validation~~ Not compatible with python 3, abandoned project?

Want to back this issue? Post a bounty on it! We accept bounties via Bountysource.

Diaoul avatar Jul 13 '15 13:07 Diaoul

@Toilal @wackou: I want to create my own robust subtitle parser and will likely create a new library for that that handles various formats. I'm looking for the right tool for the job, all subtitles formats seem to have a defined grammar that makes parsing easily possible. There are various technologies for that (PEG parsers, lexers such as LEX or YACC) and so on. Would you recommend one for that kind of work?

I saw various tools such as pyparsing, PLY, pyPEG and parsimonious. I wonder if rebulk would be able to do that? There's no decision making so I think it's not the right tool. There is also the possibility to have my own basic parser based on str and re.

Ideas are welcome :fish_cake:

Diaoul avatar Nov 06 '15 20:11 Diaoul

Do you have examples and/or specs for those formats ?

Rebulk can be used for "short input" and "pseudo-natural" language. I don't think it's the write tool to parse a structured file. It's designed to define patterns (string, regex or functional) than will be scanned in the whole input string, retrieve consistent match objects from those different type of patterns, and filter out false positives with rules implying relations between those matches.

I've never used mentioned parsers in python sorry :)

Toilal avatar Nov 06 '15 20:11 Toilal

You can find examples here:

  • https://en.wikipedia.org/wiki/SubRip
  • https://en.wikipedia.org/wiki/WebVTT
  • https://en.wikipedia.org/wiki/MicroDVD

Diaoul avatar Nov 06 '15 20:11 Diaoul

Implemented using the pysubs2 module in #1207

getzze avatar Feb 25 '25 21:02 getzze