LyndaCaptionToSrtConvertor
LyndaCaptionToSrtConvertor copied to clipboard
Missing texts and invalid characters
Following this pull-request comment, please find attached two files (with renamed extensions), that can help reproduce/demonstrate the problem:
Some noticeable problems:
- The 1st subtitle (line 9 in the
.caption; line 3 in the.srt) gets a line break at]. - The 2nd subtitle (ln 13 / ln 7) has a
b=+that shouldn't be there. - After subtitle 25 (103 / 109), the
.srtfile has some intermediate characters between the subtitles. - (...and more of the above).
I have simplified the preparesrt logic somewhat, and now it seems to work properly. Before I submit a PR, here's my version:
public string PrepareSrt()
{
const int METADATA_LINES = 7, CHARS_BEFORE_TIMESTAMP = 13, CHARS_AFTER_TIMESTAMP = 14;
//read all file in memory
string content = File.ReadAllText(filePath);
// Discard the first lines, containing metadata used by Lynda desktop app to link subtitle to video:
string output = RemoveFirstLines(content, METADATA_LINES);
// Before every timestamp we have a constant amount of characters (starting by [NUL][SOH] and ending with a newline)
output = Regex.Replace(output, @"\u0000\u0001[\s\S]{" + CHARS_BEFORE_TIMESTAMP + "}[\r\n]*", "");
// After every timestamp we also have a constant amount of characters:
output = Regex.Replace(output, @"(?<=\[\d\d:\d\d:\d\d\.\d\d\])[\s\S]{" + CHARS_AFTER_TIMESTAMP + "}", "");
// Cleanup remaining non-UTF8 ASCII chars:
output = Regex.Replace(output, @"[^\u0020-\u007F \u000D\n\r]+", "");
return output;
}
Thanks, @Dev-iL . I stopped working on this because my Lynda free subscription ended a while ago and I don't get free subscription from my new workplace. Maybe I will make a new one on another email address and check out if it still works.