simple-yet-powerful-srt-subtitle-parser-cpp
simple-yet-powerful-srt-subtitle-parser-cpp copied to clipboard
A single header simple, powerful and full blown srt subtitle parser written in C++.
= srtparser.h : Simple, yet powerful C++ SRT Subtitle Parser Library. A single header, simple, powerful full blown srt subtitle parser written in C++.
https://github.com/saurabhshri/simple-yet-powerful-srt-subtitle-parser-cpp[srtparser.h] is a single header, simple and powerful C++ srt subtitle parsing library that allows you to easily handle, process and manipulate srt subtitle files in your project. It is an extension of Oleksii Maryshchenko's simple https://github.com/young-developer/subtitle-parser[subtitle-parser]. It has following features :
- It is a single header C++ (CPP) file, and can be easily used in your project.
- Focus on portability, efficiency and simplicity with no external dependency.
- Wide variety of functions at programmers disposal to parse srt file as per need.
- Capable of :
- extracting and stripping HTML and other styling tags from subtitle text.
- extracting and stripping speaker names.
- extracting and stripping non dialogue texts.
- Easy to extend and add new functionalities.
== How to use srtparser.h
=== General usage ===
srptparser.h is a cross-platform robust srt subtitle parser.
- Download
srtparser.h
from https://github.com/saurabhshri/simple-yet-powerful-srt-subtitle-parser-cpp - Include the header file in your program.
#include "lib/srtparser.h"
- Create SubtitleParserFactory object. Use this factory object to create SubtitleParser object.
SubtitleParserFactory *subParserFactory = new SubtitleParserFactory("inputFile.srt");
SubtitleParser *parser = subParserFactory->getParser();
//to get subtitles
std::vector<SubtitleItem*> sub = parser->getSubtitles();
- Call appropriate functions to perform parsing.
See demo usage in examples
directory.
=== Parser Functions ===
The following is a complete list of available parser functions.
Syntax:
[cols="2,1,2,5"] |=== | Class | Return Type | Function | Description
| SubtitleParserFactory
| SubtitleParserFactory
| SubtitleParserFactory("inputFile.srt")
| Creates a SubtitleParserFactory object. Here the inputFile.srt is the path of subtitle file to be parsed. This object is used to create parser.
E.g.: SubtitleParserFactory *subParserFactory = new SubtitleParserFactory("inputFile.srt");
| SubtitleParserFactory
| SubtitleParser
| getParser()
| Returns the SubtitleParser object. This object will be used to parse the subtitle file.
E.g.: SubtitleParser *parser = subParserFactory->getParser();
| SubtitleParser
| std::vector<SubtitleItem*>
| getSubtitles()
| Returns the Subtitle as SubtitleItem object.
E.g.: std::vector<SubtitleItem*> sub = parser->getSubtitles();
| SubtitleParser
| std::string
| getFileData()
| Returns the complete file data read as it is from inputFile.srt
E.g.: std::string fileData = parser->getFileData();
| SubtitleItem
| long int
| getStartTime()
| Returns the starting time of subtitle in milliseconds.
E.g.: long int startTime = sub->getStartTime();
| SubtitleItem
| long int
| getEndTime()
| Returns the ending time of subtitle in milliseconds.
E.g.: long int endTime = sub->getEndTime();
| SubtitleItem
| std::string
| getStartTimeString()
| Returns the starting time of subtitle in srt format.
E.g.: std::string startTime = sub->getStartTimeString();
| SubtitleItem
| std::string
| getEndTimeString()
| Returns the ending time of subtitle in srt format.
E.g.: std::string endTime = sub->getEndTimeString();
| SubtitleItem
| std::string
| getText()
| Returns the subtitle text as present in .srt file.
E.g.: std::string text = sub->getText();
| SubtitleItem
| std::string
| getDialogue(bool keepHTML, bool doNotIgnoreNonDialogues, bool doNotRemoveSpeakerNames);
| Returns the subtitle text after processing according to parameters.
keepHTML = 1 to stop parser from stripping style tags
doNotIgnoreNonDialogues = 1 to stop parser from ignoring and extracting non dialogue texts such as (laughter).
doNotRemoveSpeakerNames = 1 to stop parser from ignoring and extracting speaker names
By default (0,0,0) values are passed.
E.g.: std::string text = sub->getDialogue();
| SubtitleItem
| int
| getWordCount()
| Returns the count of number of words present in the subtitle dialogue.
E.g.: int wordCount = sub->getWordCount();
| SubtitleItem
| std::vectorstd::string
| getIndividualWords()
| Returns string vector of individual words present in subtitle.
E.g.: std::vector<std::string> words = sub->getIndividualWords();
| SubtitleItem
| bool
| getIgnoreStatus()
| Returns the ignore status. Returns true, if the _justDialogue field i.e. subtitle after processing is empty.
E.g.: bool ignore = sub->getIgnoreStatus();
| SubtitleItem
| int
| getSpeakerCount()
| Returns the count of number of speakers present in the subtitle.
E.g.: int speakerCount = sub->getSpeakerCount();
| SubtitleItem
| std::vectorstd::string
| getSpeakerNames()
| Returns string vector of speaker names.
E.g.: std::vector<std::string> speakerNames = sub->getSpeakerNames();
| SubtitleItem
| int
| getNonDialogueCount()
| Returns the count of number of non dialogue words present in the subtitle.
E.g.: int nonDialogueCount = sub->getNonDialogueCount();
| SubtitleItem
| std::vectorstd::string
| getNonDialogueWords()
| Returns string vector of non dialogue words.
E.g.: std::vector<std::string> nonDialogueWords = sub->getNonDialogueWords();
| SubtitleItem
| int
| getStyleTagCount()
| Returns the count of number of style tags present in the subtitle.
E.g.: int styleTagCount = sub->getStyleTagCount();
| SubtitleItem
| std::vectorstd::string
| getStyleTags()
| Returns string vector of style tags.
E.g.: std::vector<std::string> styleTags = sub->getStyleTags();
| SubtitleWord
| std::string
| getText()
| Returns the subtitle text as present in .srt file.
E.g.: std::string text = sub->getText();
|===
Examples
While I've tried to include examples in the above table, a compilation of all of them together in a single C++ program can be found in example
directory.
Contributing
Suggestions, features request, PRs, bug reports, bug fixes are welcomed. I'll be thankful.
Credits
Built upon a MIT licensed simple subtitle-parser called LibSub-Parser by Oleksii Maryshchenko.
The original parser had 3 major functions : getStartTime(), getEndTime() and getText().
Rest work done by Saurabh Shrivastava, originally for using this in his https://saurabhshri.github.io/2017/05/gsoc/creating-a-full-blown-srt-subtitle-parser[GSoC project].