ccextractor icon indicating copy to clipboard operation
ccextractor copied to clipboard

new cc format for msnbc?

Open williamj77 opened this issue 8 months ago • 13 comments

Carlos:

Can you take a look at this ts file?

If you remember, you created a custom version of ccextractor for the Hauppuage tv tuner.

There now seems to be periods inserted into the text.

Attached are the files.

You can download the ts file here:

https://www.dropbox.com/scl/fi/4b1y86efag39sjnmm65hs/all_in_with_chris_hayes_20250326_1958.ts?rlkey=tyid6blj5hvsbyhg1mxs9nvr8&st=557jrkq8&dl=0

Again, this is from a Hauppauge tv tuner.

I have experienced issues from a potential hacker.

Sincerely, William Johnston

ccoutput.zip

williamj77 avatar Apr 03 '25 07:04 williamj77

I'm not super involved in the code these days but someone that is currently active will take a look ASAP.

cfsmp3 avatar Apr 04 '25 14:04 cfsmp3

ccx_encoders_srt.c.zip

Playing with the ccx_encoders_srt.c file I found this "solution" that removes periods in places like the example you showed @williamj77 . However, it also strips periods from the ends of all sentences, which affects expected output in other cases. Thats why I'm not making a pull request because it can affect other use cases. Still, it might help someone else refine the logic

Image

321david123 avatar Apr 23 '25 22:04 321david123

Hello,

Is the output correct?

If so, can you send me the Windows exe?

I am more of a C#/Java developer.

Sincerely, William Johnston

From: David Sent: Wednesday, April 23, 2025 6:19 PM To: CCExtractor/ccextractor Cc: William Johnston ; Mention Subject: Re: [CCExtractor/ccextractor] new cc format for msnbc? (Issue #1681)

321david123 left a comment (CCExtractor/ccextractor#1681) ccx_encoders_srt.c.zip

Playing with the ccx_encoders_srt.c file I found this "solution" that removes periods in places like the example you showed @williamj77 . However, it also strips periods from the ends of all sentences, which affects expected output in other cases. Thats why I'm not making a pull request because it can affect other use cases. Still, it might help someone else refine the logic

Screenshot.2025-04-23.at.5.17.25.PM.png (view on web) — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***>

williamj77 avatar Apr 24 '25 01:04 williamj77

It does work @williamj77 here's the complete file:

output.srt.zip

If you want to try it by yourself on other files, you just have to replace the encoder with the one in my last comment and follow the standard build instructions. Hope this helps!.

321david123 avatar Apr 24 '25 20:04 321david123

If the problem is with the input, we shouldn't do anything. Periods can be removed by a post script if needed.

If the problem is that we're not processing the input correctly, then we should figure out what's going on. But if players such as VLC display the periods, then they're just there and there's nothing for us to fix.

cfsmp3 avatar Apr 25 '25 15:04 cfsmp3

Carlos:

The cc is correct for a different tv channel.

And I have been experiencing hacker issues.

Again, I was wondering if the output is correct for the updated code.

Sincerely, William Johnston

From: Carlos Fernandez Sanz Sent: Friday, April 25, 2025 11:31 AM To: CCExtractor/ccextractor Cc: William Johnston ; Mention Subject: Re: [CCExtractor/ccextractor] new cc format for msnbc? (Issue #1681)

cfsmp3 left a comment (CCExtractor/ccextractor#1681) If the problem is with the input, we shouldn't do anything. Periods can be removed by a post script if needed.

If the problem is that we're not processing the input correctly, then we should figure out what's going on. But if players such as VLC display the periods, then they're just there and there's nothing for us to fix.

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***>

williamj77 avatar Apr 25 '25 16:04 williamj77

Can anyone build and send me the exe for the updated code?

I am not a native C++ developer anymore.

From: Carlos Fernandez Sanz Sent: Friday, April 25, 2025 11:31 AM To: CCExtractor/ccextractor Cc: William Johnston ; Mention Subject: Re: [CCExtractor/ccextractor] new cc format for msnbc? (Issue #1681)

cfsmp3 left a comment (CCExtractor/ccextractor#1681) If the problem is with the input, we shouldn't do anything. Periods can be removed by a post script if needed.

If the problem is that we're not processing the input correctly, then we should figure out what's going on. But if players such as VLC display the periods, then they're just there and there's nothing for us to fix.

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***>

williamj77 avatar Apr 28 '25 17:04 williamj77

Hello William Here is a binary of the latest CCExtractor code. Hope this helps :) CCExtractor latest https://drive.google.com/file/d/1VetQZd559QRFrGFG-_HvB3BKKRIpg39W/view?usp=share_link

On Mon, 28 Apr 2025 at 22:43, William Johnston @.***> wrote:

williamj77 left a comment (CCExtractor/ccextractor#1681) https://github.com/CCExtractor/ccextractor/issues/1681#issuecomment-2835933830
Can anyone build and send me the exe for the updated code?

I am not a native C++ developer anymore.

From: Carlos Fernandez Sanz
Sent: Friday, April 25, 2025 11:31 AM
To: CCExtractor/ccextractor
Cc: William Johnston ; Mention
Subject: Re: [CCExtractor/ccextractor] new cc format for msnbc? (Issue #1681)

cfsmp3 left a comment (CCExtractor/ccextractor#1681)
If the problem is with the input, we shouldn't do anything. Periods can be removed by a post script if needed.

If the problem is that we're not processing the input correctly, then we should figure out what's going on. But if players such as VLC display the periods, then they're just there and there's nothing for us to fix.


Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you were mentioned.Message ID: @.***>

— Reply to this email directly, view it on GitHub https://github.com/CCExtractor/ccextractor/issues/1681#issuecomment-2835933830, or unsubscribe https://github.com/notifications/unsubscribe-auth/BFTBVDOTWCN2JKEVG7NJG3D23ZOURAVCNFSM6AAAAAB2LOIWHCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDQMZVHEZTGOBTGA . You are receiving this because you are subscribed to this thread.Message ID: @.***>

vatsalkeshav avatar Apr 28 '25 20:04 vatsalkeshav

Thanks for the exe.

Can you include this updated code?

From: Vatsal Keshav Sent: Monday, April 28, 2025 4:15 PM To: CCExtractor/ccextractor Cc: William Johnston ; Mention Subject: Re: [CCExtractor/ccextractor] new cc format for msnbc? (Issue #1681)

vats004 left a comment (CCExtractor/ccextractor#1681) Hello William
Here is a binary of the latest CCExtractor code.
Hope this helps :)
CCExtractor latest
<https://drive.google.com/file/d/1VetQZd559QRFrGFG-_HvB3BKKRIpg39W/view?usp=share_link>

On Mon, 28 Apr 2025 at 22:43, William Johnston @.>
wrote:

> williamj77 left a comment (CCExtractor/ccextractor#1681)
> <https://github.com/CCExtractor/ccextractor/issues/1681#issuecomment-2835933830>
> <br> Can anyone build and send me the exe for the updated code? <br> <br>
> I am not a native C++ developer anymore. <br> <br> From: Carlos Fernandez
> Sanz <br> Sent: Friday, April 25, 2025 11:31 AM <br> To:
> CCExtractor/ccextractor <br> Cc: William Johnston ; Mention <br> Subject:
> Re: [CCExtractor/ccextractor] new cc format for msnbc? (Issue #1681) <br>
> <br> cfsmp3 left a comment (CCExtractor/ccextractor#1681) <br> If the
> problem is with the input, we shouldn&#39;t do anything. Periods can be
> removed by a post script if needed. <br> <br> If the problem is that
> we&#39;re not processing the input correctly, then we should figure out
> what&#39;s going on. But if players such as VLC display the periods, then
> they&#39;re just there and there&#39;s nothing for us to fix. <br> <br> —
> <br> Reply to this email directly, view it on GitHub, or unsubscribe. <br>
> You are receiving this because you were mentioned.Message ID:
> @.
&gt; <br>
>
> —
> Reply to this email directly, view it on GitHub
> <https://github.com/CCExtractor/ccextractor/issues/1681#issuecomment-2835933830>,
> or unsubscribe
> <https://github.com/notifications/unsubscribe-auth/BFTBVDOTWCN2JKEVG7NJG3D23ZOURAVCNFSM6AAAAAB2LOIWHCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDQMZVHEZTGOBTGA>
> .
> You are receiving this because you are subscribed to this thread.Message
> ID: @.>
>
— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.
>

williamj77 avatar Apr 28 '25 20:04 williamj77

Sure, @321david123 's code seems to output the same captions as before. It'll be work great with a little refining. Till then, here's a post-processing-binary for removing unnecessary periods [srt-cleaner] (https://drive.google.com/file/d/1YsXf_y5mRu7JNSSFwxR9eXeVaHofQuUR/view?usp=share_link) github

Use like this : $srt-cleaner your_input.srt your_output.srt

Edit : I was trying running it with if (ccx_options.hauppauge_mode){//period-removing-logic} but @321david123's logic works without that. Here is the binary of code as of 29 Apr 2025 + period removing logic contributed by @321david123 : updated ccxr

vatsalkeshav avatar Apr 29 '25 09:04 vatsalkeshav

Thanks again.

But please take a look at the srt file with added periods.

Again, can you create an exe with the updated code?

From: Vatsal Keshav Sent: Tuesday, April 29, 2025 7:19 AM To: CCExtractor/ccextractor Cc: William Johnston ; Mention Subject: Re: [CCExtractor/ccextractor] new cc format for msnbc? (Issue #1681)

vats004 left a comment (CCExtractor/ccextractor#1681) Sure, @321david123 's code seems to output the same captions as
before. It'll be work great with a little refining.
Till then, here's a post-processing-binary for removing unnecessary periods [srt-cleaner]
(https://drive.google.com/file/d/1YsXf_y5mRu7JNSSFwxR9eXeVaHofQuUR/view?usp=share_link)
[](github
: https://github.com/vats004/srt-cleaner)

Use like this :
$srt-cleaner your_input.srt your_output.srt

On Tue, 29 Apr 2025 at 02:18, William Johnston @.>
wrote:

> williamj77 left a comment (CCExtractor/ccextractor#1681)
> <https://github.com/CCExtractor/ccextractor/issues/1681#issuecomment-2836554254>
> <br> Thanks for the exe. <br> <br> Can you include this updated code? <br>
> <br> <br> <br> From: Vatsal Keshav <br> Sent: Monday, April 28, 2025 4:15
> PM <br> To: CCExtractor/ccextractor <br> Cc: William Johnston ; Mention
> <br> Subject: Re: [CCExtractor/ccextractor] new cc format for msnbc? (Issue
> #1681) <br> <br> vats004 left a comment (CCExtractor/ccextractor#1681) <br>
> Hello William &lt;br&gt; Here is a binary of the latest CCExtractor code.
> &lt;br&gt; Hope this helps :) &lt;br&gt; CCExtractor latest &lt;br&gt;
> &amp;lt;
> https://drive.google.com/file/d/1VetQZd559QRFrGFG-_HvB3BKKRIpg39W/view?usp=share_link&amp;gt;
> &lt;br&gt; &lt;br&gt; On Mon, 28 Apr 2025 at 22:43, William Johnston
> @.
&amp;gt; &lt;br&gt; wrote: &lt;br&gt; &lt;br&gt; &amp;gt;
> williamj77 left a comment (CCExtractor/ccextractor#1681) &lt;br&gt;
> &amp;gt; &amp;lt;
> https://github.com/CCExtractor/ccextractor/issues/1681#issuecomment-2835933830&amp;gt;
> &lt;br&gt; &amp;gt; &amp;lt;br&amp;gt; Can anyone build and send me the exe
> for the updated code? &amp;lt;br&amp;gt; &amp;lt;br&amp;gt; &lt;br&gt;
> &amp;gt; I am not a native C++ developer anymore. &amp;lt;br&amp;gt;
> &amp;lt;br&amp;gt; From: Carlos Fernandez &lt;br&gt; &amp;gt; Sanz
> &amp;lt;br&amp;gt; Sent: Friday, April 25, 2025 11:31 AM &amp;lt;br&amp;gt;
> To: &lt;br&gt; &amp;gt; CCExtractor/ccextractor &amp;lt;br&amp;gt; Cc:
> William Johnston ; Mention &amp;lt;br&amp;gt; Subject: &lt;br&gt; &amp;gt;
> Re: [CCExtractor/ccextractor] new cc format for msnbc? (Issue #1681)
> &amp;lt;br&amp;gt; &lt;br&gt; &amp;gt; &amp;lt;br&amp;gt; cfsmp3 left a
> comment (CCExtractor/ccextractor#1681) &amp;lt;br&amp;gt; If the &lt;br&gt;
> &amp;gt; problem is with the input, we shouldn&amp;amp;#39;t do anything.
> Periods can be &lt;br&gt; &amp;gt; removed by a post script if needed.
> &amp;lt;br&amp;gt; &amp;lt;br&amp;gt; If the problem is that &lt;br&gt;
> &amp;gt; we&amp;amp;#39;re not processing the input correctly, then we
> should figure out &lt;br&gt; &amp;gt; what&amp;amp;#39;s going on. But if
> players such as VLC display the periods, then &lt;br&gt; &amp;gt;
> they&amp;amp;#39;re just there and there&amp;amp;#39;s nothing for us to
> fix. &amp;lt;br&amp;gt; &amp;lt;br&amp;gt; — &lt;br&gt; &amp;gt;
> &amp;lt;br&amp;gt; Reply to this email directly, view it on GitHub, or
> unsubscribe. &amp;lt;br&amp;gt; &lt;br&gt; &amp;gt; You are receiving this
> because you were mentioned.Message ID: &lt;br&gt; &amp;gt;
> @.&amp;amp;gt; &amp;lt;br&amp;gt; &lt;br&gt; &amp;gt; &lt;br&gt;
> &amp;gt; — &lt;br&gt; &amp;gt; Reply to this email directly, view it on
> GitHub &lt;br&gt; &amp;gt; &amp;lt;
> https://github.com/CCExtractor/ccextractor/issues/1681#issuecomment-2835933830&amp;gt;,
> &lt;br&gt; &amp;gt; or unsubscribe &lt;br&gt; &amp;gt; &amp;lt;
> https://github.com/notifications/unsubscribe-auth/BFTBVDOTWCN2JKEVG7NJG3D23ZOURAVCNFSM6AAAAAB2LOIWHCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDQMZVHEZTGOBTGA&amp;gt;
> &lt;br&gt; &amp;gt; . &lt;br&gt; &amp;gt; You are receiving this because
> you are subscribed to this thread.Message &lt;br&gt; &amp;gt; ID:
> @.
&amp;gt; &lt;br&gt; &amp;gt; &lt;br&gt; <br> — <br> Reply to
> this email directly, view it on GitHub, or unsubscribe. <br> You are
> receiving this because you were mentioned.Message ID: @.&gt; <br>
>
> —
> Reply to this email directly, view it on GitHub
> <https://github.com/CCExtractor/ccextractor/issues/1681#issuecomment-2836554254>,
> or unsubscribe
> <https://github.com/notifications/unsubscribe-auth/BFTBVDM6Q4O7G4RHU5NQDAL232HZ3AVCNFSM6AAAAAB2LOIWHCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDQMZWGU2TIMRVGQ>
> .
> You are receiving this because you commented.Message ID:
> @.
>
>
— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***>

williamj77 avatar Apr 29 '25 12:04 williamj77

Hi, if you're using windows, then you could easily just run the docker build for testing out different files. Here's the instructions to run the main branch, you can just replace the \path\to\video\ with the location of your file and then copy and paste into terminal. For testing another file, just run docker run --rm -v $(pwd):$(pwd) -w "$(pwd)" --user $(id -u):$(id -g) ccextractor:latest <YOURFILE> --hauppauge -o output.srt

git clone https://github.com/CCExtractor/ccextractor.git
cd ccextractor\docker
docker build --platform linux/amd64 -t ccextractor .
copy \path\to\video\all_in_with_chris_hayes_20250326_1958.ts .
docker run --rm -v $(pwd):$(pwd) -w "$(pwd)" --user $(id -u):$(id -g) ccextractor:latest all_in_with_chris_hayes_20250326_1958.ts --hauppauge -o output.srt

If you wanted to run it with 321david123's new SRT encoder, I've made a branch for the updated code(credit for the code goes to 321david123)

git clone https://github.com/steel-bucket/ccextractor/ -b 321david123-FIX
cd ccextractor\docker
docker build --platform linux/amd64 -t ccextractor .
copy \path\to\video\all_in_with_chris_hayes_20250326_1958.ts .
docker run --rm -v $(pwd):$(pwd) -w "$(pwd)" --user $(id -u):$(id -g) ccextractor:latest all_in_with_chris_hayes_20250326_1958.ts --hauppauge -o output.srt

This is for testing files, if there's need for exe, we can prepare one.

steel-bucket avatar Apr 29 '25 17:04 steel-bucket

Hello,

I don’t like docker on my machine.

Is there a workaround?

From: Deepnarayan Sett Sent: Tuesday, April 29, 2025 1:13 PM To: CCExtractor/ccextractor Cc: William Johnston ; Mention Subject: Re: [CCExtractor/ccextractor] new cc format for msnbc? (Issue #1681)

steel-bucket left a comment (CCExtractor/ccextractor#1681) Hi, if you're using windows, then you could easily just run the docker build for testing out different files. Here's the instructions to run the main branch, you can just replace the \path\to\video\ with the location of your file and then copy and paste into terminal. For testing another file, just run docker run --rm -v $(pwd):$(pwd) -w "$(pwd)" --user $(id -u):$(id -g) ccextractor:latest <YOURFILE> --hauppauge -o output.srt

git clone https://github.com/CCExtractor/ccextractor.git cd ccextractor\docker docker build --platform linux/amd64 -t ccextractor . copy \path\to\video\all_in_with_chris_hayes_20250326_1958.ts . docker run --rm -v $(pwd):$(pwd) -w "$(pwd)" --user $(id -u):$(id -g) ccextractor:latest all_in_with_chris_hayes_20250326_1958.ts --hauppauge -o output.srt

If you wanted to run it with 321david123's new SRT encoder, I've made a branch for the updated code(credit for the code goes to 321david123)

git clone https://github.com/steel-bucket/ccextractor/ -b 321david123-FIX cd ccextractor/docker docker build --platform linux/amd64 -t ccextractor . copy \path\to\video\all_in_with_chris_hayes_20250326_1958.ts . docker run --rm -v $(pwd):$(pwd) -w "$(pwd)" --user $(id -u):$(id -g) ccextractor:latest ./all_in_with_chris_hayes_20250326_1958.ts --hauppauge -o output.srt — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***>

williamj77 avatar Apr 30 '25 17:04 williamj77