captioning
captioning copied to clipboard
WEBVTT : Issue when contains multiple blank lines.
When parsing a WEBVTT file that has a cue time and follows with two blank lines, it will though an error. I read the standards and this is acceptable for there to be a blank line representing science. I get files generated in this way from a outside vendor. Is there anyway this can be resolved?
See Example of a file below. Also attached is a zip with on working version and one no-working version of the same VTT file.
WEBVTT
00:00:00.000 --> 00:00:04.100 align:middle line:90%
00:00:04.100 --> 00:00:14.690 align:middle line:84%
Foreign policy is a very important aspect of our
government and impacts greatly on our national security.
Please let me know... Thanks, Jason
I'm having the exact same issue. Is there a fix for this?
I encountered the issue trying to use this library.
You can try using the code below to correct the content at runtime, i used this code to fix the content of 50 subtitles in VTT, maybe in other files it needs some additional adjustment, but in general that's it.
<?php
$contents = file_get_contents($file);
$contents = preg_replace('`(\x0d\x0a){3,}`', "\x0a", $contents);
$contents = preg_replace('`(\x0d){3,}`', "\x0a", $contents);
$contents = preg_replace('`(\x0a){3,}`', "\x0a", $contents);
$contents = preg_replace_callback('`((?:[0-9]{2,}:)?[0-9]{2}:[0-9]{2}.[0-9]{3}) --> ((?:[0-9]{2,}:)?[0-9]{2}:[0-9]{2}.[0-9]{3})( .*)?[\x0a]+((?:[0-9]{2,}:)?[0-9]{2}:[0-9]{2}.[0-9]{3}) --> ((?:[0-9]{2,}:)?[0-9]{2}:[0-9]{2}.[0-9]{3})( .*)?`', function($match) {
return sprintf("%s --> %s%s\x0a\x0a%s --> %s%s", $match[1], $match[2], $match[3], $match[4], $match[5], isset($match[6]) ? $match[6] : '');
}, $contents);
$contents = trim($contents)."\x0a";
$parser = new \Captioning\Format\WebvttFile();
try {
$parser->loadFromString($contents);
print('Parser Success'.'<br/>'."\x0a");
} catch (\Exception $e) {
print($e->getMessage().'<br/>'."\x0a");
exit();
}