ccextractor
ccextractor copied to clipboard
OCR status summary - tests not passing, a fix broke something else
Summarizing the situation here so we have all the information handy.
One of the tests have been failing for a while. Specifically, we're getting garbage in some of the subtitle frames (but not all) for one specific sample. The failing test is here:
https://sampleplatform.ccextractor.org/test/3308
We know that the guilty commit is this:
https://github.com/CCExtractor/ccextractor/commit/84a9ea5572da4728fc4ad01b88978808c925ad9f
Which itself fixed something else, so just reverting it would probably fix this test at the expense of breaking the original sample again.
I spent a bit of time yesterday on it, and it's clearly a problem with the OCR, however the input images are correct. Enabling DEBUG_OCR (which writes the massaged images as the OCR engine -tesseract- gets them) show that the input contains what we expect.
So currently I suspect a problem with the internal status of the OCR (possibly we're not reinitializing something, who knows).
Since we have all samples, the previous code, the new code, etc, I think troubleshooting this should take a reasonable amount of time (and patience).
We want to release 0.89 in the next couple of days, with 0.90 following shortly after. This should be fixed (properly) in one of the two releases.
I'm assigning this to @harrynull (don't know if around though - haven't seen him in a way) because he sent that commit, and to @PunitLodha since at some point this code will be rewritten to Rust anyway and Punit is working preparing things for the Rust work.
Is the issue still not resolved? I would like to work on this issue(or any other issue).
Not solved, got for it
On Fri, Dec 3, 2021, 21:23 Ritesh Maurya @.***> wrote:
Is the issue still not resolved? I would like to work on this issue(or any other issue).
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/CCExtractor/ccextractor/issues/1346#issuecomment-985971428, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABNMTWMCX36IXZDSVDHQRO3UPGQVJANCNFSM46RIE44Q .
Tested it just now. Unfortunately, still broken.