ccextractor icon indicating copy to clipboard operation
ccextractor copied to clipboard

[BUG] OCR works only for first DVB subtitle stream (OCR context is not shared)

Open nikop opened this issue 5 years ago • 16 comments

CCExtractor version (using the --version parameter preferably) : 0.87

In raising this issue, I confirm the following (please check boxes, eg [X] - and delete unchecked ones):

  • [x] I have read and understood the contributors guide.
  • [x] I have checked that the bug-fix I am reporting can be replicated, or that the feature I am suggesting isn't already present.
  • [x] I have checked that the issue I'm posting isn't already reported.
  • [x] I have checked that the issue I'm porting isn't already solved and no duplicates exist in closed issues and in opened issues
  • [x] I have checked the pull requests tab for existing solutions/implementations to my issue/suggestion.
  • [x] I have used the latest available version of CCExtractor to verify this issue exists.

My familiarity with the project is as follows (check one, eg [X] - and delete unchecked ones):

  • [x] I have used CCExtractor just a couple of times.

Necessary information

  • Is this a regression (did it work before)? [x] NO
  • What platform did you use? [x] Windows
  • What were the used arguments? -out=srt -bom -latin1 -codec dvbsub "test.ts" -datapid 0xCE0 -ocrlang fin

Video links (replace text below with your links)

https://madjoki.com/ts/test.ts

Additional information

Following works, but will extract first subtitle (0xCDF). This is expected result.

-out=srt -bom -latin1 -codec dvbsub "test.ts" -ocrlang fin
-out=srt -bom -latin1 -codec dvbsub "test.ts" -datapid 0xCDF -ocrlang fin

Cause seems to be:

https://github.com/CCExtractor/ccextractor/blob/dac9de4d67523e60ed07ee0e868195f90827acd3/src/lib_ccx/ts_tables.c#L358

It will work by commenting these two lines. It seems to be intent to share OCR between decoders, but this is not done.

Which means: https://github.com/CCExtractor/ccextractor/blob/dac9de4d67523e60ed07ee0e868195f90827acd3/src/lib_ccx/dvb_subtitle_decoder.c#L1664

will become false and skip OCR

nikop avatar Jan 20 '19 22:01 nikop

@nikop Do we have the same issue? This is my ticket for not able to extract second dvbsub text but png image extract works fine for all tracks. https://github.com/CCExtractor/ccextractor/issues/1163

Murmur avatar Dec 30 '19 08:12 Murmur

@Murmur Yes, this seems to be same issue.

nikop avatar Jan 03 '20 20:01 nikop

@Murmur

If you want to try and compile yourself:

ts_tables.diff.txt

nikop avatar Jan 05 '20 20:01 nikop

@nikop is it still happening in current master? We've done a lot of work in the past weeks and I'm going over all the issues - cleaning up. Thanks.

cfsmp3 avatar Jan 25 '20 23:01 cfsmp3

Hi, I just checked it with the current master, still same result (no captions produced with -datapid 0xCE0). Same problem as #1163

mfarberbrodsky avatar Jan 26 '20 13:01 mfarberbrodsky

It still does work by commenting the two lines nikop suggested:

if (!pinfo->initialized_ocr)
    pinfo->initialized_ocr = 1;

What's their purpose? Everything seems to be working without them.

mfarberbrodsky avatar Jan 26 '20 14:01 mfarberbrodsky

@mfarberbrodsky It declares the OCR "initialized" if it wasn't. I don't think however that the problem is there but rather that some other place must be checking that variable and only do something is the ocr is not initialized.

Once you've gotten that far I'd say it can't be too hard to fix.

cfsmp3 avatar Jan 26 '20 18:01 cfsmp3

@cfsmp3 I investigated this problem a bit more, and I think I found the root of the issue. It starts here. On line 357, ocr_ctx is initiated only once, when pinfo->initialized_ocr is still 0. It is then stored in the returned ptr. That ptr is written to ctx in update_capinfo, and then it is actually stored as codec_private_data only in that specific pid (you can see that here) - which is the first pid that contains caption data, since ocr_ctx is initiated once. All the other pids won't have ocr_ctx, and this is why no captions are produced. I believe this is why the problem occurs, however I'm not sure yet what solution I can implement.

mfarberbrodsky avatar Jan 29 '20 18:01 mfarberbrodsky

@mfarberbrodsky That's a good investigation, good job :-)

cfsmp3 avatar Jan 29 '20 18:01 cfsmp3

Hi, I have the same issue. I tried the workaround : https://github.com/CCExtractor/ccextractor/issues/1067#issuecomment-578506743 Unfortunately, it causes an other issue : the subtitle timestamps are wrong, the first offset is null.

hamelg avatar Apr 18 '20 20:04 hamelg

No, the workaround works fine. My timestamp issue was related to my ts file.

hamelg avatar Apr 29 '20 19:04 hamelg

Has there been any workaround or solution for this, I'm seeing the same issue here:

Text #1
ID                                       : 1024 (0x400)
Menu ID                                  : 1 (0x1)
Format                                   : DVB Subtitle
Codec ID                                 : 6
Duration                                 : 32 s 800 ms
Delay relative to video                  : 10 s 0 ms
Language                                 : German

Text #2
ID                                       : 1025 (0x401)
Menu ID                                  : 1 (0x1)
Format                                   : DVB Subtitle
Codec ID                                 : 6
Duration                                 : 35 s 760 ms
Delay relative to video                  : 10 s 0 ms
Language                                 : esp

Text #3
ID                                       : 1026 (0x402)
Menu ID                                  : 1 (0x1)
Format                                   : DVB Subtitle
Codec ID                                 : 6
Duration                                 : 38 s 520 ms
Delay relative to video                  : 10 s 0 ms
Language                                 : French

Text #4
ID                                       : 1027 (0x403)
Menu ID                                  : 1 (0x1)
Format                                   : DVB Subtitle
Codec ID                                 : 6
Duration                                 : 35 s 760 ms
Delay relative to video                  : 10 s 0 ms
Language                                 : Italian

ccextractorwinfull.exe -datapid 1027 DVBSubtitles.ts
CCExtractor 0.89, Carlos Fernandez Sanz, Volker Quetschke.
Teletext portions taken from Petr Kutalek's telxcc
--------------------------------------------------------------------------
Input: DVBSubtitles.ts
[Extract: 1] [Stream mode: Autodetect]
[Program : Auto ] [Hauppage mode: No] [Use MythTV code: Auto]
[Timing mode: Auto] [Debug: No] [Buffer input: Yes]
[Use pic_order_cnt_lsb for H.264: No] [Print CC decoder traces: No]
[Target format: .srt] [Encoding: UTF-8] [Delay: 0] [Trim lines: No]
[Add font color data: Yes] [Add font typesetting: Yes]
[Convert case: No][Filter profanity: No] [Video-edit join: No]
[Extraction start time: not set (from start)]
[Extraction end time: not set (to end)]
[Live stream: No] [Clock frequency: 90000]
[Teletext page: Autodetect]
[Start credits text: None]
[Quantisation-mode: CCExtractor's internal function]

-----------------------------------------------------------------
Opening file: DVBSubtitles.ts
File seems to be a transport stream, enabling TS mode
Analyzing data in general mode
100%  |  00:45
Number of NAL_type_7: 0
Number of VCL_HRD: 0
Number of NAL HRD: 0
Number of jump-in-frames: 0
Number of num_unexpected_sei_length: 0

Min PTS:                                00:00:00:576
Max PTS:                                00:00:46:336
Length:                          00:00:45:760
Done, processing time = 2 seconds

No captions were found in input.
Issues? Open a ticket here
https://github.com/CCExtractor/ccextractor/issues

rboy1 avatar Jun 18 '21 17:06 rboy1

Has there been any workaround or solution for this, I'm seeing the same issue here:

You answered yourself :-)

cfsmp3 avatar Jun 18 '21 17:06 cfsmp3

@nikop the file is not available, do you have it somewhere?

cfsmp3 avatar Mar 22 '23 01:03 cfsmp3

@cfsmp3 I reuploaded the file

nikop avatar Mar 22 '23 17:03 nikop

@cfsmp3 I reuploaded the file

Thanks. Please leave it there until this is fixed :-)

Your original post points to code, but it uses master instead of a specific commit, so the lines you point at doesn't seem to match current master. Can you update that?

cfsmp3 avatar Mar 22 '23 18:03 cfsmp3