ccextractor icon indicating copy to clipboard operation
ccextractor copied to clipboard

[BUG] Not extracting dvbsub from mp4

Open rboy1 opened this issue 3 years ago • 2 comments

CCExtractor version: 0.89

CCExtractor 0.89, Carlos Fernandez Sanz, Volker Quetschke.
Teletext portions taken from Petr Kutalek's telxcc
--------------------------------------------------------------------------
CCExtractor detailed version info
        Version: 0.89
        Git commit: 1d7589e653e73ccd9d74c10cfdd611495cdfc443
        Compilation date: 2021-06-13
        File SHA256: e5ef53bf7ebe8d8e189a5e99cf4f1d7e3301ba15fc7c5c1886c1c81b9d32896d
Libraries used by CCExtractor
        Tesseract Version: 4.00.00alpha
        Leptonica Version: leptonica-1.74 (Dec 31 2016, 10:56:23) [MSC v.1900 LIB Release x86]
        libGPAC Version: 1.0.1
        zlib: 1.2.11
        utf8proc Version: 2.4.0
        protobuf-c Version: 1.3.1
        libpng Version: 1.6.37
        FreeType
        libhash
        nuklear
        libzvbi

In raising this issue, I confirm the following:

  • [X] I have read and understood the contributors guide.
  • [X] I have checked that the bug-fix I am reporting can be replicated, or that the feature I am suggesting isn't already present.
  • [X] I have checked that the issue I'm posting isn't already reported.
  • [X] I have checked that the issue I'm porting isn't already solved and no duplicates exist in closed issues and in opened issues
  • [X] I have checked the pull requests tab for existing solutions/implementations to my issue/suggestion.
  • [X] I have used the latest available version of CCExtractor to verify this issue exists.
  • [X] I have ticked all the boxes in this section and to prove it I'm deleting the section completely to remove boilerplate text.

Necessary information

  • What platform did you use? Windows
  • What were the used arguments?

Video links

https://drive.google.com/file/d/1OENPitCBBU4fXYilDkLoAO09ibtrOu5e/view?usp=sharing

Additional information

The mp4 has a forced english track that plays in VLC (e.g. play at timestamp 36:24). FFmpeg also recognizes the track but ccextractor doesn't seem to be able to process it. Also included the tessdata 3 directory to convert the dvbsub into srt but it isn't recognizing the track so the OCR isn't working either.

Without tessdata directory

CCExtractor 0.89, Carlos Fernandez Sanz, Volker Quetschke.
Teletext portions taken from Petr Kutalek's telxcc
--------------------------------------------------------------------------
Input: Dvbsub in mp4.mp4
[Extract: 1] [Stream mode: Autodetect]
[Program : Auto ] [Hauppage mode: No] [Use MythTV code: Auto]
[Timing mode: Auto] [Debug: No] [Buffer input: Yes]
[Use pic_order_cnt_lsb for H.264: No] [Print CC decoder traces: No]
[Target format: .srt] [Encoding: UTF-8] [Delay: 0] [Trim lines: No]
[Add font color data: Yes] [Add font typesetting: Yes]
[Convert case: No][Filter profanity: No] [Video-edit join: No]
[Extraction start time: not set (from start)]
[Extraction end time: not set (to end)]
[Live stream: No] [Clock frequency: 90000]
[Teletext page: Autodetect]
[Start credits text: None]
[Quantisation-mode: CCExtractor's internal function]

-----------------------------------------------------------------
Opening file: Dvbsub in mp4.mp4
Detected MP4 box with name: ftyp
Detected MP4 box with name: free
File seems to be a MP4
Analyzing data with GPAC (MP4 library)
Opening 'Dvbsub in mp4.mp4': ←[33m[iso file] Box "dec3" (start 2211937835) has 2 extra bytes
←[0mok
Track 1, type=vide subtype=avc1
Track 2, type=soun subtype=ec-3
Track 3, type=subp subtype=MPEG
Track 4, type=text subtype=text
MP4: found 4 tracks: 1 avc and 1 cc
Processing track 1, type=vide subtype=avc1
Processing track 2, type=soun subtype=ec-3
Processing track 3, type=subp subtype=MPEG
Processing track 4, type=text subtype=text

Unsupported track type text:text! Please report.
100%  |  00:00
Closing media: ok
Found 1 AVC track(s). Found 1 CC track(s).


Total frames time:        00:00:00:000  (0 frames at 29.97fps)

Min PTS:                                00:00:00:000
Max PTS:                                00:00:00:000
Length:                          00:00:00:000
Done, processing time = 0 seconds

No captions were found in input.
Issues? Open a ticket here
https://github.com/CCExtractor/ccextractor/issues

With tessdata directory

CCExtractor 0.89, Carlos Fernandez Sanz, Volker Quetschke.
Teletext portions taken from Petr Kutalek's telxcc
--------------------------------------------------------------------------
Input: Dvbsub in mp4.mp4
[Extract: 1] [Stream mode: Autodetect]
[Program : Auto ] [Hauppage mode: No] [Use MythTV code: Auto]
[Timing mode: Auto] [Debug: No] [Buffer input: Yes]
[Use pic_order_cnt_lsb for H.264: No] [Print CC decoder traces: No]
[Target format: .srt] [Encoding: UTF-8] [Delay: 0] [Trim lines: No]
[Add font color data: Yes] [Add font typesetting: Yes]
[Convert case: No][Filter profanity: No] [Video-edit join: No]
[Extraction start time: not set (from start)]
[Extraction end time: not set (to end)]
[Live stream: No] [Clock frequency: 90000]
[Teletext page: Autodetect]
[Start credits text: None]
[Quantisation-mode: CCExtractor's internal function]

-----------------------------------------------------------------
Opening file: Dvbsub in mp4.mp4
Detected MP4 box with name: ftyp
Detected MP4 box with name: free
File seems to be a MP4
Analyzing data with GPAC (MP4 library)
Opening 'Dvbsub in mp4.mp4': ←[33m[iso file] Box "dec3" (start 2211937835) has 2 extra bytes
←[0mok
Track 1, type=vide subtype=avc1
Track 2, type=soun subtype=ec-3
Track 3, type=subp subtype=MPEG
Track 4, type=text subtype=text
MP4: found 4 tracks: 1 avc and 1 cc
Processing track 1, type=vide subtype=avc1
Processing track 2, type=soun subtype=ec-3
Processing track 3, type=subp subtype=MPEG
Processing track 4, type=text subtype=text

Unsupported track type text:text! Please report.
100%  |  00:00
Closing media: ok
Found 1 AVC track(s). Found 1 CC track(s).


Total frames time:        00:00:00:000  (0 frames at 29.97fps)

Min PTS:                                00:00:00:000
Max PTS:                                00:00:00:000
Length:                          00:00:00:000
Done, processing time = 0 seconds

No captions were found in input.
Issues? Open a ticket here
https://github.com/CCExtractor/ccextractor/issues

rboy1 avatar Jun 18 '21 16:06 rboy1

We don't seem to support this specific type of subs yet.

Unsupported track type text:text! Please report.

MediaInfo:

Text ID : 3 Format : VobSub Codec ID : mp4s-E0 Codec ID/Info : The same subtitle format used on DVDs Duration : 6 min 57 s Source duration : 1 h 40 min Bit rate mode : Variable Stream size : 0.00 Byte (0%) Source stream size : 180 KiB (0%) Title : American English (forced) / American English (forced) Language : English Forced : No Encoded date : UTC 2021-06-12 01:26:32 Tagged date : UTC 2021-06-12 01:26:32

canihavesomecoffee avatar Jun 18 '21 17:06 canihavesomecoffee

Sample available on https://sampleplatform.ccextractor.org/sample/172

canihavesomecoffee avatar Nov 20 '21 16:11 canihavesomecoffee