ccextractor icon indicating copy to clipboard operation
ccextractor copied to clipboard

[PROPOSAL] Extract subtitles in a Chinese newscast

Open Liontooth opened this issue 6 years ago • 6 comments

The following video file was was recorded in mainland China, using Joker-tv with the DTMB television standard. When I watch it, I'm seeing what looks like subtitles / captions. Can CCExtractor see them? I was not able to get it to work. I did not try OCR, which may be what is required.

http://vrnewsscape.ucla.edu/dropbox/2018-01-09_2033_CN_CCTV1_%e6%96%b0%e9%97%bb1+1.mpg

Cheers, David

Liontooth avatar Jan 24 '18 22:01 Liontooth

I would like to work on this issue.

jimboH avatar Feb 20 '18 16:02 jimboH

Sure, just go ahead.

On 20-Feb-2018 10:00 PM, "jimboH" [email protected] wrote:

I would like to work on this issue.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/CCExtractor/ccextractor/issues/918#issuecomment-367034799, or mute the thread https://github.com/notifications/unsubscribe-auth/AL1y1E0531qHlWaeyQIFmOF_BslT0gfzks5tWvMKgaJpZM4RsBWz .

saurabhshri avatar Feb 20 '18 16:02 saurabhshri

@Liontooth @cfsmp3

The subtitles in the given video file can be extracted using hardsubx parameter.

./ccextractor 2018-01-09_2033_CN_CCTV1_新闻1+1.mpg -hardsubx -ocrlang chi_sim

To display them in the video player one requires compatible fonts but they are indeed being extracted and below is an image showing the same.

screenshot from 2018-04-11 21-21-17

There are errors in the output due to inaccuracies of the OCR but they are out of the scope of this issue and is a separate GSoC project.

thealphadollar avatar Apr 11 '18 15:04 thealphadollar

Assigned to @Abhinav95 which in turn will assign it to the GSoC student(s) he sees fit.

cfsmp3 avatar Apr 11 '18 18:04 cfsmp3

Just browsing GSoC issues, looking to work on the flutter project but I think I can help out a little bit with this issue.

Here is the document for the global standard of DTMB (GB20600-2006), 130 pages all in Chinese: https://www.doc88.com/p-810688531386.html Here is a patent of a hardware that is able to separate audio and video signals and display subtitles on DTMB equipment: https://nxgp.cnki.net/kcms/detail?v=kxaUMs6x7-4I2jr5WTdXti3zQ9F92xu0N5Lim4gHJeVFMNAZBuVUfzvmz2LuJgb7bn8rlgaJH4AQ98pqdK9FqNLQT3L2E_Cs&uniplatform=NZKPT Here is tons of recordings of DTMB televisions on Bilibili (can be downloaded by tools like you-get): https://search.bilibili.com/all?keyword=dtmb&from_source=nav_search_new

Feel free to move on or maybe write proposals with all these vital links. I really like to work on this issue, but for this issue, the solver definitely needs to know two languages: C and Chinese. I happen to know about Chinese but not much C.

fewwwww avatar Mar 08 '21 04:03 fewwwww

@fewwwww This is great, thanks! Let's hope there's a brave student that knows C and feels like doing this with your help :-)

cfsmp3 avatar Mar 28 '21 23:03 cfsmp3

Closing to keep track of our Chinese wishlist here: https://github.com/CCExtractor/ccextractor/issues/224

cfsmp3 avatar Mar 22 '23 05:03 cfsmp3