ccextractor
ccextractor copied to clipboard
[PROPOSAL] Add SCC support to CEA-708 decoder
Add support for SCC format to CEA-708 decoder. Currently, only SRT, SAMI and Transcript formats are supported, https://github.com/CCExtractor/ccextractor/blob/master/src/rust/src/decoder/tv_screen.rs#L126-L134
SCC format details :- http://www.theneitherworld.com/mcpoodle/SCC_TOOLS/DOCS/SCC_FORMAT.HTML
#1423
Just to be clear, i looked up similar function write_sami()
. Basically it is writing to a file and the contents should look like the image i have embedded.
So if i want to add support for SCC format , then subtitles that are extracted should look like this right
Scenarist_SCC V1.0
01:02:53:14 94ae 94ae 9420 9420 947a 947a 97a2 97a2 a820 68ef f26e 2068 ef6e 6be9 6e67 2029 942c 942c 8080 8080 942f 942f
01:02:55:14 942c 942c
01:03:27:29 94ae 94ae 9420 9420 94f2 94f2 c845 d92c 2054 c845 5245 ae80 942c 942c 8080 8080 942f 942f
I am working on this problem, and i will be sure to read contributor guidelines and contact you if i get stuck.
Here's a sample SCC extract from the sample WhackedOutVideos_short.mov using a commercial tool
sample video: https://drive.google.com/file/d/13p6HBxGXlm0BGpaS15JwCJjfnBdm_Qbm/view?usp=sharing
Scenarist_SCC V1.0
00:58:56:14 e96e 2043 616e 6164 61ae
00:58:58:19 9426 94ad 9470 4ff2 20e9 7320 f468 e973 2073 796e e368 f2ef 6ee9 7ae5 6480
00:58:59:23 9426 94ad 9470 73f4 e9e3 6b20 70ef 6be9 6e67 bf80
00:59:02:03 9426 94ad 9470 c1e3 f475 61ec ec79 2c20 f468 ef73 e520 61f2 e520 f468 e520 f2ef 6473
00:59:03:09 9426 94ad 9470 f468 e579 2075 73e5 20f4 ef20 ecef e361 f4e5 20ec ef73 f420 70e5 ef70 ece5
00:59:04:29 9426 94ad 9470 eff2 20ef 62ea e5e3 f473 20e9 6e20 7570 20f4 ef20 3132 20e6 e5e5 f480
00:59:06:17 9426 94ad 9470 efe6 2070 eff7 64e5 f2ae
00:59:08:18 9426 94ad 9470 496e 20f4 68e9 7320 e361 73e5 2c20 f468 e579 20e6 e96e 6420 f468 e973
00:59:09:19 9426 94ad 9470 6475 64e5 a773 2076 e964 e5ef 20e3 616d e5f2 61ae
00:59:12:03 9426 94ad 9470 c16e 6420 f468 e520 73f7 e561 f220 62ec e97a 7a61 f264
00:59:13:14 9426 94ad 9470 73f4 61f2 f473 2061 6761 e96e ae80
00:59:14:27 9426 94ad 9470 a862 ece5 e570 e96e 6729
00:59:18:26 9426 94ad 9470 54f7 efad f468 e9f2 6473 20ef e620 f468 e520 f7ef f2ec 6480
00:59:20:22 9426 94ad 9470 e973 20e3 ef76 e5f2 e564 2062 7920 f761 f4e5 f280
00:59:22:16 9426 94ad 9470 616e 6420 f468 e520 f2e5 73f4 20e9 7320 e3ef 76e5 f2e5 6420 6279 2075 73ae
00:59:24:23 9426 94ad 9470 54e9 6de5 20f4 ef20 6761 f468 e5f2 2075 7020 61ec ec20 f468 e520 67ef efe6 7980
00:59:26:04 9426 94ad 9470 67ef e96e 6773 adef 6e20 e6f2
I took a shot at adding SCC support for the 708 decoder. I tried adding a function write_scc
on tv_screen.rs
and here is the commit on my fork: https://github.com/CCExtractor/ccextractor/compare/master...voidash:master
i ran the ccextractor in debug mode with these flags for the video https://drive.google.com/file/d/13p6HBxGXlm0BGpaS15JwCJjfnBdm_Qbm/view.
-in=mp4
-out=scc
-nofc
-dru
/home/cdjk/Downloads/WhackedOutVideos_short.mov
-o
/home/cdjk/Downloads/main.scc
-708
Here is the complete output: https://pastebin.com/58ieUtfY
Without -708
flag , the output is little different from #1423 . https://pastebin.com/PygNqWRh
My major concern is that Writer
object is only being created for the last three lines.
[CEA-708] 00:00:30,030 00:00:30,029
[CEA-708] First: 0, Last: 29
[CEA-708] 9426 94ad 9470 616e 6420 f468 e520 f2e5 73f4 20e9 7320 e3ef 76e5 f2e5 6420 6279 2075 73ae
[CEA-708] 00:00:30,030 00:00:30,029
[CEA-708] First: 0, Last: 30
[CEA-708] 9426 94ad 9470 54e9 6de5 20f4 ef20 6761 f468 e5f2 2075 7020 61ec ec20 f468 e520 67ef efe6 7980
[CEA-708] 00:00:30,030 00:00:30,029
[CEA-708] First: 0, Last: 30
[CEA-708] 9426 94ad 9470 67ef e96e 6773 adef 6e20 e6f2 ef6d 2061 f2ef 756e 6420 f468 e520 67ec ef62 e580
And for those three lines , the start and end times are same. and the output file main.scc
contains
Scenarist_SCC V1.0
only
However, the file main.p0.svc01.scc
has those last three lines.
Note: i wrote write_scc
function by looking how write_srt
and write_transcript
work. If there is something i need to understand please let me know
@PunitLodha can you take a look at @voidash 's work?
Yes, I will in some time
So, for some reason, mp4 still uses the C decoder. And changing it to rust is not as straightforward. I am working on it.
Meanwhile, @voidash could you replicate the changes in C here, https://github.com/CCExtractor/ccextractor/blob/master/src/lib_ccx/ccx_decoders_708_output.c#L370-L392
Ok, i will take a look at it.
I tried replicating the changes in C. here is the diff file : https://github.com/voidash/ccextractor/commit/fb5dbe29593fb68146ee7deb71270f53f93f0d18
Here is the output when i passed the following parameters
-in=mp4 -out=scc -nofc -dru /home/cdjk/Downloads/WhackedOutVideos_short.mov -o /home/cdjk/Downloads/main.scc
https://pastebin.com/VeY4BmbK
The temp file main.p0.svc01.scc
file is being written and the contents look like this :
https://pastebin.com/xq6Jwfuv
but main.scc
is still unwritten. Looking at the console output it looks as if the caption type is roll up
0:00:15:982 --> 00:00:17:350
In this case, they find this
dude's video camera.
And the swear blizzard
00:00:15:29 9426 94ad 9470 496e 20f4 68e9 7320 e361 73e5 2c20 f468 e579 20e6 e96e 6420 f468 e973
9426 94ad 9470 6475 64e5 a773 2076 e964 e5ef 20e3 616d e5f2 61ae
9426 94ad 9470 c16e 6420 f468 e520 73f7 e561 f220 62ec e97a 7a61 f264
00:00:17:351 --> 00:00:18:784
dude's video camera.
And the swear blizzard
starts again.
00:00:17:10 9426 94ad 9470 6475 64e5 a773 2076 e964 e5ef 20e3 616d e5f2 61ae
9426 94ad 9470 c16e 6420 f468 e520 73f7 e561 f220 62ec e97a 7a61 f264
9426 94ad 9470 73f4 61f2 f473 2061 6761 e96e ae80
Any suggestions on what should i do next?
main.scc
will be empty because it is supposed to contain subs for 608, which is not present here. main.p0.svc01.scc
is the file which is supposed to have 708 subs. So that is correct.
But I can see some issues with the output. One being that there are multiple timestamps on the same line. Other than that, I think the clear caption command is missing, which should be present at end time of each subtitle
The mp4 code has a different flow. We use libgpac to actually open the mp4 file and the entry point into the decoders is different than the usual general loop.
It should be easy to change though and call the rust code.
@PunitLodha . main.p0.svc01.scc
now looks like this.
00:00:02:15 94ae 9420 9470 4ff2 20e9 7320 f468 e973 2073 796e e368 f2ef 6ee9 7ae5 6480
00:00:03:18 942c 942c
00:00:03:19 94ae 9420 9470 4ff2 20e9 7320 f468 e973 2073 796e e368 f2ef 6ee9 7ae5 648094ae 9420 9470 73f4 e9e3 6b20 70ef 6be9 6e67 bf80
00:00:05:28 942c 942c
You can take a look at my approach here : https://github.com/voidash/ccextractor/commit/e449557c8c6b31b73aa434c81d471818e832f5f8
Here is pastebin for main.p0.svc01.scc
: https://pastebin.com/aMiaEStY
So 708 decoder found SCC subs which means Scenarist_SCC V1.0
header should be added on top of the main.p0.svc01.scc
and also i guess i should remove the rust code which is just appending last three caption text
I'd recommend looking into this - @PunitLodha
https://github.com/CCExtractor/ccextractor/blob/6efa41a7e6a083e240015592189391a0f78caa37/src/lib_ccx/mp4.c#L398
If you can just call rust from there you're good to go. After that everything is the same thing.
I did look at that. But due to how the code is structured, it's not as easy as just calling the rust function from there. I'll have to change some stuff from the rust side first
@voidash
So 708 decoder found SCC subs which means
Scenarist_SCC V1.0
header should be added on top of themain.p0.svc01.scc
Check out how sami header is added, and do it the same way
also i guess i should remove the rust code which is just appending last three caption text
The last captions are added by the code which you added in rust. It is called by the flush function. So you should correct the rust code too, and send a PR
If this issue has been abandoned, I could start working on this.
The mp4 code has a different flow. We use libgpac to actually open the mp4 file and the entry point into the decoders is different than the usual general loop.
It should be easy to change though and call the rust code.
Is there a video with 708 captions which is not an MP4? This might help me avoid implementing this in C and/or changing the current MP4 flow.
If this issue has been abandoned, I could start working on this.
Sure, go for it.
Is there a video with 708 captions which is not an MP4? This might help me avoid implementing this in C and/or changing the current MP4 flow.
Yes, almost any US Transport Stream. You can find plenty on our website.
#1499 details the issue with mp4 code flow and how to fix it
Hi, I would like to work on this issue and continue to work on where @voidash left it. Just wanted to know what is the current progress and what things are needed to fulfil the feature. And lil bit of how could i resolve it.
If any necessary information i should know, just tell me that as well. @PunitLodha @cfsmp3