ccextractor
ccextractor copied to clipboard
[BUG] or [QUESTION] : Hardsubx didn't extract burn-in subs exactly as expected
Please prefix your issue with one of the following: [BUG], [QUESTION].
CCExtractor version: 0.94 CCExtractor detailed version info Git commit: 290e2f10f9e681c0ba1d53df5ba29166622b0a20 Compilation date: 2021-12-27 Libraries used by CCExtractor Tesseract Version: 4.1.1 Leptonica Version: leptonica-1.79.0 libGPAC Version: 1.0.1 zlib: 1.2.11 utf8proc Version: 2.4.0 protobuf-c Version: 1.3.1 libpng Version: 1.6.37 FreeType libhash nuklear libzvbi
In raising this issue, I confirm the following:
- [x] I have read and understood the contributors guide.
- [x] I have checked that the bug-fix I am reporting can be replicated, or that the feature I am suggesting isn't already present.
- [x] I have checked that the issue I'm posting isn't already reported.
- [x] I have checked that the issue I'm porting isn't already solved and no duplicates exist in closed issues and in opened issues
- [x] I have checked the pull requests tab for existing solutions/implementations to my issue/suggestion.
- [x] I have used the latest available version of CCExtractor to verify this issue exists.
- [x] I have ticked all the boxes in this section and to prove it I'm deleting the section completely to remove boilerplate text.
Necessary information
- Is this a regression (i.e. did it work before)? {DON'T KNOW}
- What platform did you use? {Linux}
- What were the used arguments?
myVideo.mp4 -ocrlang fra -hardsubx -ocr_mode frame -subcolor white -min_sub_duration 0.01 -detect_italics -whiteness_thresh 97 -conf_thresh 75
Video links
- {you can ask for an exemple if needed, run the tool on 24 videos with some differants look and feel of burn in subs...all the time the same issues.}
Additional information
{I have several issue:
- Subtitle output didn't fit on a time base with the burn in subtitle. Sometimes, the subtitles start a little bit (between 2sec until a few frame) before as the burn in subtitle. I tried to change the parameter of OCR_MODE, but no change on this delay.
- Subtitle duration of the output: All subtitle extracted has a a fix duration of 1 or 2 seconds. But nothing in between, more or less. That means, the subtitle disappears before that the burn in subtitle disappear.
- Burn in subtitle on 2 lines: Inside the output, mostly, only the second line a recognized. Or it create 2 or 3 subtitles with always something more inside the subtitle...but online on a one line, not exactly as it is in burn in subs.
I don't know if it is an issue or if I didn't use correctly the parameters, but as I said, I tried a lot of different ways...with all the time the same result. That is why, I'm opening an Issue.
Thanks for your feedback and help.}
Please share the videos, so that we could look into this issue
On investigating, at least on the files I have there are subtitles extracted with duration less than a second. So I guess the second point is not entirely general, either it has changed since the issue or it's an artefact of the files used.
@brebetez please consider sharing the file you used
cc: @PunitLodha