ccextractor icon indicating copy to clipboard operation
ccextractor copied to clipboard

OCR status summary - tests not passing, a fix broke something else

Open cfsmp3 opened this issue 3 years ago • 4 comments

Summarizing the situation here so we have all the information handy.

One of the tests have been failing for a while. Specifically, we're getting garbage in some of the subtitle frames (but not all) for one specific sample. The failing test is here:

https://sampleplatform.ccextractor.org/test/3308

We know that the guilty commit is this:

https://github.com/CCExtractor/ccextractor/commit/84a9ea5572da4728fc4ad01b88978808c925ad9f

Which itself fixed something else, so just reverting it would probably fix this test at the expense of breaking the original sample again.

I spent a bit of time yesterday on it, and it's clearly a problem with the OCR, however the input images are correct. Enabling DEBUG_OCR (which writes the massaged images as the OCR engine -tesseract- gets them) show that the input contains what we expect.

So currently I suspect a problem with the internal status of the OCR (possibly we're not reinitializing something, who knows).

Since we have all samples, the previous code, the new code, etc, I think troubleshooting this should take a reasonable amount of time (and patience).

We want to release 0.89 in the next couple of days, with 0.90 following shortly after. This should be fixed (properly) in one of the two releases.

cfsmp3 avatar Jun 11 '21 18:06 cfsmp3

I'm assigning this to @harrynull (don't know if around though - haven't seen him in a way) because he sent that commit, and to @PunitLodha since at some point this code will be rewritten to Rust anyway and Punit is working preparing things for the Rust work.

cfsmp3 avatar Jun 11 '21 18:06 cfsmp3

Is the issue still not resolved? I would like to work on this issue(or any other issue).

MauryaRitesh avatar Dec 04 '21 05:12 MauryaRitesh

Not solved, got for it

On Fri, Dec 3, 2021, 21:23 Ritesh Maurya @.***> wrote:

Is the issue still not resolved? I would like to work on this issue(or any other issue).

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/CCExtractor/ccextractor/issues/1346#issuecomment-985971428, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABNMTWMCX36IXZDSVDHQRO3UPGQVJANCNFSM46RIE44Q .

cfsmp3 avatar Dec 04 '21 05:12 cfsmp3

Tested it just now. Unfortunately, still broken.

cfsmp3 avatar Mar 22 '23 01:03 cfsmp3