ccextractor
ccextractor copied to clipboard
[PROPOSAL] Extract subtitles in a Chinese newscast
The following video file was was recorded in mainland China, using Joker-tv with the DTMB television standard. When I watch it, I'm seeing what looks like subtitles / captions. Can CCExtractor see them? I was not able to get it to work. I did not try OCR, which may be what is required.
http://vrnewsscape.ucla.edu/dropbox/2018-01-09_2033_CN_CCTV1_%e6%96%b0%e9%97%bb1+1.mpg
Cheers, David
I would like to work on this issue.
Sure, just go ahead.
On 20-Feb-2018 10:00 PM, "jimboH" [email protected] wrote:
I would like to work on this issue.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/CCExtractor/ccextractor/issues/918#issuecomment-367034799, or mute the thread https://github.com/notifications/unsubscribe-auth/AL1y1E0531qHlWaeyQIFmOF_BslT0gfzks5tWvMKgaJpZM4RsBWz .
@Liontooth @cfsmp3
The subtitles in the given video file can be extracted using hardsubx parameter.
./ccextractor 2018-01-09_2033_CN_CCTV1_新闻1+1.mpg -hardsubx -ocrlang chi_sim
To display them in the video player one requires compatible fonts but they are indeed being extracted and below is an image showing the same.
There are errors in the output due to inaccuracies of the OCR but they are out of the scope of this issue and is a separate GSoC project.
Assigned to @Abhinav95 which in turn will assign it to the GSoC student(s) he sees fit.
Just browsing GSoC issues, looking to work on the flutter project but I think I can help out a little bit with this issue.
Here is the document for the global standard of DTMB (GB20600-2006), 130 pages all in Chinese: https://www.doc88.com/p-810688531386.html Here is a patent of a hardware that is able to separate audio and video signals and display subtitles on DTMB equipment: https://nxgp.cnki.net/kcms/detail?v=kxaUMs6x7-4I2jr5WTdXti3zQ9F92xu0N5Lim4gHJeVFMNAZBuVUfzvmz2LuJgb7bn8rlgaJH4AQ98pqdK9FqNLQT3L2E_Cs&uniplatform=NZKPT Here is tons of recordings of DTMB televisions on Bilibili (can be downloaded by tools like you-get): https://search.bilibili.com/all?keyword=dtmb&from_source=nav_search_new
Feel free to move on or maybe write proposals with all these vital links. I really like to work on this issue, but for this issue, the solver definitely needs to know two languages: C and Chinese. I happen to know about Chinese but not much C.
@fewwwww This is great, thanks! Let's hope there's a brave student that knows C and feels like doing this with your help :-)
Closing to keep track of our Chinese wishlist here: https://github.com/CCExtractor/ccextractor/issues/224