tesseract icon indicating copy to clipboard operation
tesseract copied to clipboard

poor performance of page.dispose() call

Open JackyZhangFuDan opened this issue 4 years ago • 2 comments

I'm trying to use this library to identify text in a picture, all fine, but unacceptable performance.

At begining, I got an error message: "Only one image can be processed at once. Please make sure you dispose of the page once your finished with it", like it said I need to destroy page object before processing next one, then I did it, but the performance became very poor:

  1. engine.process(....) only costs 0.021 seconds
  2. page.dispose() cost about 2 seconds...

how page.dispose() cost so much time? my picture isn't bigger than 100k.

JackyZhangFuDan avatar Apr 14 '21 12:04 JackyZhangFuDan

I don't think the image size has anything to do is the duration of the disposal on the object. Instead of manually disposing, which I'm sure it implements IDisposable which allows you to use the Using feature, just create a new instance each time. After all, .NET is managed code, which the garbage collector will take care of your memory management.

ghost avatar May 02 '21 09:05 ghost

Agree with Jonathan here. I can confirm that the page object does indeed implement IDisposable and the using pattern should be used.

My guess is that your not disposing the other resources when you're finished (pix, engine etc). Leaving the system in an undefined state. Recommended having a look at the example repo. Would also be a good idea to try running tesseract exe on an example image to double check tesseract itself doesn't have any issues with it.

For reference disposing the page object will call TessBaseApiClear to effectively reset/clear the engine's state so it can process the next page/image. The following goes into quite a bit of detail if you're interested https://stackoverflow.com/questions/51069618/the-semantics-of-tessbaseapiclear

On Sun, 2 May 2021, 19:55 Jonathan Dahan, @.***> wrote:

I don't think the image size has anything to do is the duration of the disposal of the object. Instead of manually disposing, which I'm sure it implements IDisposable which allows you to use the Using feature, just create a new instance each time. After all, .NET is managed code, which the garbage collector will take care of your memory management.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/charlesw/tesseract/issues/550#issuecomment-830781616, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAB7HSGNC77HWS34GLMVQWTTLUOPNANCNFSM425IHE5Q .

charlesw avatar May 02 '21 22:05 charlesw