ccextractor icon indicating copy to clipboard operation
ccextractor copied to clipboard

[BUG] Failing to extract DVB subtitles from live stream (Failed to perform OCR)

Open jakubvojacek opened this issue 5 years ago • 2 comments

CCExtractor version (using the --version parameter preferably) : 0.87

  • [x] I have read and understood the contributors guide.
  • [x] I have checked that the bug-fix I am reporting can be replicated, or that the feature I am suggesting isn't already present.
  • [x] I have checked that the issue I'm posting isn't already reported.
  • [x] I have checked that the issue I'm porting isn't already solved and no duplicates exist in closed issues and in opened issues
  • [x] I have checked the pull requests tab for existing solutions/implementations to my issue/suggestion.
  • [x] I have used the latest available version of CCExtractor to verify this issue exists.

My familiarity with the project is as follows (check one, eg [X] - and delete unchecked ones):

  • [ ] I have never used CCExtractor.
  • [ ] I have used CCExtractor just a couple of times.
  • [x] I absolutely love CCExtractor, but have not contributed previously.
  • [ ] I am an active contributor to CCExtractor.

Necessary information

  • Is this a regression (did it work before)? [x] NO | [ ] YES - please specify the last known working version
  • What platform did you use? [ ] Windows - [x] Linux - [x] Mac
  • What were the used arguments? fails even with -udp 239.1.2.3:1234 (unrelated, but originally i was testing with -ocrlang por -quant 0 -datapid 0x451 -out=webvtt -noru -trim -lf -nots -nobom -s -nofc -nogt)

**Video links (replace text below with your links) ** tnt.ts - https://goo.gl/r4WXto

Additional information Interestingly, when running ccextractor on the file (ccextractor tnt.ts), it does produce a tnt.srt file with correct subtitles in it. However, it does print a whole bunch of errors.

But when the tnt.ts is being played out in a loop (for example tsplay tnt.ts 239.1.2.3:1234 -loop), ccextractor fails eventually (the time before it fails varies in seconds to a minute usually)

root@jones:~/tnt# ccextractor   -udp 239.1.2.3:1234
CCExtractor 0.87, Carlos Fernandez Sanz, Volker Quetschke.
Teletext portions taken from Petr Kutalek's telxcc
--------------------------------------------------------------------------
Input: Network, 239.1.2.3:1234
[Extract: 1] [Stream mode: Autodetect]
[Program : Auto ] [Hauppage mode: No] [Use MythTV code: Auto]
[Timing mode: Auto] [Debug: No] [Buffer input: Yes]
[Use pic_order_cnt_lsb for H.264: No] [Print CC decoder traces: No]
[Target format: .srt] [Encoding: UTF-8] [Delay: 0] [Trim lines: No]
[Add font color data: Yes] [Add font typesetting: Yes]
[Convert case: No] [Video-edit join: No]
[Extraction start time: not set (from start)]
[Extraction end time: not set (to end)]
[Live stream: No] [Clock frequency: 90000]
[Teletext page: Autodetect]
[Start credits text: None]
[Quantisation-mode: CCExtractor's internal function]

----------------------------------------------------------------------
Reading from UDP socket 239.1.2.3:1234
File seems to be a transport stream, enabling TS mode
Analyzing data in general mode
Error in boxClipToRectangle: box outside rectangle
Warning in pixClipRectangle: box doesn't overlap pix
Error in boxClipToRectangle: box outside rectangle
Warning in pixClipRectangle: box doesn't overlap pix
Error in pixConvertRGBToGray: pixs not defined
Error in pixGetDimensions: pix not defined
Error in pixGetColormap: pix not defined
Error in pixClone: pixs not defined
Error in pixGetDepth: pix not defined
Error in pixGetWpl: pix not defined
Error in pixGetYRes: pix not defined

TessBaseAPIRecognize returned -1, skipping this bitmap.
Error in boxClipToRectangle: box outside rectangle
Warning in pixClipRectangle: box doesn't overlap pix
Error in boxClipToRectangle: box outside rectangle
Warning in pixClipRectangle: box doesn't overlap pix
Error in pixConvertRGBToGray: pixs not defined
Error in pixGetDimensions: pix not defined
Error in pixGetColormap: pix not defined
Error in pixClone: pixs not defined
Error in pixGetDepth: pix not defined
Error in pixGetWpl: pix not defined
Error in pixGetYRes: pix not defined

TessBaseAPIRecognize returned -1, skipping this bitmap.
TS continuity counter not incremented prev/curr 11/6
dvbsub_decode: incomplete, broken or empty packet, remaining bytes=3249, segment_length=3490
Return from dvbsub_decode: -1
Error in boxClipToRectangle: box outside rectangle
Warning in pixClipRectangle: box doesn't overlap pix
Error in boxClipToRectangle: box outside rectangle
Warning in pixClipRectangle: box doesn't overlap pix
Error in pixConvertRGBToGray: pixs not defined
Error: In ocr_bitmap: Failed to perform OCR - Failed to get text. Please report.

Issues? Open a ticket here
https://github.com/CCExtractor/ccextractor/issues

Can you please look into what is wrong?

Thank you Jakub

jakubvojacek avatar Oct 23 '18 06:10 jakubvojacek

@jakubvojacek Is this still a problem in current master?

cfsmp3 avatar Jan 25 '20 23:01 cfsmp3

Hello @cfsmp3

I just tested with the current master (5f61fae0c7dacb05e2f42d5647aafc59d3cd2ef6) and it's still happening, it's reproducible on a static file now too. If you download https://goo.gl/r4WXto and try to play in VLC and enable Portugesse DVB subtitles, there will be subtitles visible. While trying with ccextractor (plain ccextractor tnt.ts), it will throw the same errors as described above. I have attached the console output below.

root@ts:/opt/ccextractor# git rev-parse HEAD
5f61fae0c7dacb05e2f42d5647aafc59d3cd2ef6

root@ts:/opt/ccextractor# build/ccextractor /data/tnt.ts
CCExtractor 0.88, Carlos Fernandez Sanz, Volker Quetschke.
Teletext portions taken from Petr Kutalek's telxcc
--------------------------------------------------------------------------
Input: /data/tnt.ts
[Extract: 1] [Stream mode: Autodetect]
[Program : Auto ] [Hauppage mode: No] [Use MythTV code: Auto]
[Timing mode: Auto] [Debug: No] [Buffer input: No]
[Use pic_order_cnt_lsb for H.264: No] [Print CC decoder traces: No]
[Target format: .srt] [Encoding: UTF-8] [Delay: 0] [Trim lines: No]
[Add font color data: Yes] [Add font typesetting: Yes]
[Convert case: No][Filter profanity: No] [Video-edit join: No]
[Extraction start time: not set (from start)]
[Extraction end time: not set (to end)]
[Live stream: No] [Clock frequency: 90000]
[Teletext page: Autodetect]
[Start credits text: None]
[Quantisation-mode: CCExtractor's internal function]

-----------------------------------------------------------------
Opening file: /data/tnt.ts
File seems to be a transport stream, enabling TS mode
Analyzing data in general mode
Error in boxClipToRectangle: box outside rectangle
Warning in pixClipRectangle: box doesn't overlap pix
Error in boxClipToRectangle: box outside rectangle
Warning in pixClipRectangle: box doesn't overlap pix
Error in pixConvertRGBToGray: pixs not defined
Error in pixGetDimensions: pix not defined
Error in pixGetColormap: pix not defined
Error in pixClone: pixs not defined
Error in pixGetDepth: pix not defined
Error in pixGetWpl: pix not defined
Error in pixGetYRes: pix not defined

TessBaseAPIRecognize returned -1, skipping this bitmap.
TS continuity counter not incremented prev/curr 10/14
dvbsub_decode: incomplete, broken or empty packet, remaining bytes=2917, segment_length=3462
Return from dvbsub_decode: -1
Error in boxClipToRectangle: box outside rectangle
Warning in pixClipRectangle: box doesn't overlap pix
Error in boxClipToRectangle: box outside rectangle
Warning in pixClipRectangle: box doesn't overlap pix
Error in pixConvertRGBToGray: pixs not defined
Error: In ocr_bitmap: Failed to perform OCR - Failed to get text. Please report.

Issues? Open a ticket here
https://github.com/CCExtractor/ccextractor/issues

jakubvojacek avatar Jan 26 '20 10:01 jakubvojacek