webvtt-py
webvtt-py copied to clipboard
manage color styles during conversion to srt
using
vtt = webvtt.read('styles2.vtt')
vtt.save_as_srt('styles2_converted.srt')
input
WEBVTT
STYLE
::cue {
font-family: Verdana, Arial, Tiresias;
line-height: 125%;
}
::cue(.white) {
color: #ffffff;
}
::cue(.lime) {
color: #00ff00;
}
::cue(.cyan) {
color: #00ffff;
}
::cue(.red) {
color: #ff0000;
}
::cue(.yellow) {
color: #ffff00;
}
::cue(.magenta) {
color: #ff00ff;
}
::cue(.blue) {
color: #0000ff;
}
::cue(.black) {
color: #000000;
}
::cue(.bg_black) {
background: rgba(0, 0, 0, 0.76);
}
sub0
00:00:07.120 --> 00:00:09.480 line:-1
<c.magenta.bg_black>Musique douce</c>
sub1
00:00:09.720 --> 00:00:29.520 align:left line:-1
<c.magenta.bg_black>---</c>
sub2
00:00:32.439 --> 00:00:35.320 line:-1
<c.magenta.bg_black>Musique douce</c>
sub3
00:00:35.560 --> 00:02:25.240 align:left line:-1
<c.magenta.bg_black>---</c>
sub4
00:02:25.480 --> 00:02:27.440 line:-1
<c.white.bg_black>-Stéphane ? Où on se gare ?</c>
sub5
00:02:27.680 --> 00:02:29.280 align:left line:-1
<c.white.bg_black>-Euh, là-bas, au chêne.</c>
should give
1
00:00:07,120 --> 00:00:09,480
<font color="#ff00ff">Musique douce</font>
2
00:00:09,720 --> 00:00:29,520
<font color="#ff00ff">---</font>
3
00:00:32,439 --> 00:00:35,320
<font color="#ff00ff">Musique douce</font>
4
00:00:35,560 --> 00:02:25,240
<font color="#ff00ff">---</font>
5
00:02:25,480 --> 00:02:27,440
<font color="#ffffff">-Stéphane ? Où on se gare ?</font>
6
00:02:27,680 --> 00:02:29,280
<font color="#ffffff">-Euh, là-bas, au chêne.</font>
instead of
1
00:00:07,120 --> 00:00:09,480
<c.magenta.bg_black>Musique douce</c>
2
00:00:09,720 --> 00:00:29,520
<c.magenta.bg_black>---</c>
3
00:00:32,439 --> 00:00:35,320
<c.magenta.bg_black>Musique douce</c>
4
00:00:35,560 --> 00:02:25,240
<c.magenta.bg_black>---</c>
5
00:02:25,480 --> 00:02:27,440
<c.white.bg_black>-Stéphane ? Où on se gare ?</c>
6
00:02:27,680 --> 00:02:29,280
<c.white.bg_black>-Euh, là-bas, au chêne.</c>
While reading this doc I remembered why I use html.unescape
. It's because of the ampersand (&
) and greater-than (>
) escape sequences, not because of tags.
WebVTT styling is a little complex when you take into account the use of identifiers. I was thinking of a solution using an HTML parser.
Thanks for the info.
I'll add a new commit.
For the moment, I'm doing this in webvtt.structures.Caption.replace_color
def replace_color(x, tag, v):
return ("" if tag == "c" else ("<" + tag + ">")) \
+ "<font color=\"" + v + "\">" \
+ x.group(1) \
+ "</font>" \
+ ("" if tag == "c" else ("</" + tag + ">"))
x.group(1)
should be html.unescape(x.group(1))
An input could contain a tag within a tag:
for example: <c.magenta.bg_black><i>Some italic</i> and normal coloured text. By the way, 2 < 3 !</c>
I've made some tests, it is well rendered with this output
<font color="#ff00ff"><i>Some italic</i> and normal coloured text. By the way, 2 < 3 ! </font>