tesseract icon indicating copy to clipboard operation
tesseract copied to clipboard

Determining justification

Open adeebr opened this issue 11 years ago • 8 comments

Hi,

I need to detect the justification of a block of text (eg.right, centred, left). I am still pretty new to the tesseract engine but after doing some research I believe that it is possible. References: http://zdenop.github.io/tesseract-doc/class_paragraph_model.html http://zdenop.github.io/tesseract-doc/ocrpara_8cpp_source.html http://zdenop.github.io/tesseract-doc/namespacetesseract.html#a550970d1662b3ac5830c6a28dba676b1

I was wondering if the paragraph model class is accessible through this wrapper? If yes then could you please provide an example of how to use it. If not then would it be possible to include it?

Thanks

adeebr avatar Oct 31 '14 13:10 adeebr

Unfortunately no, the problem is that this information isn't exposed through the capi in Tesseract 3.02. I'll check the latest tesseract sources and see if its since been added later today. On 1 Nov 2014 00:11, "adeebr" [email protected] wrote:

Hi,

I need to detect the justification of a block of text (eg.right, centred, left). I am still pretty new to the tesseract engine but after doing some research I believe that it is possible. References: http://zdenop.github.io/tesseract-doc/class_paragraph_model.html http://zdenop.github.io/tesseract-doc/ocrpara_8cpp_source.html

http://zdenop.github.io/tesseract-doc/namespacetesseract.html#a550970d1662b3ac5830c6a28dba676b1

I was wondering if the paragraph model class is accessible through this wrapper? If yes then could you please provide an example of how to use it. If not then would it be possible to include it?

Thanks

— Reply to this email directly or view it on GitHub https://github.com/charlesw/tesseract/issues/131.

charlesw avatar Oct 31 '14 21:10 charlesw

Just checked 3.04 (master branch) and can confirm this functionality isn't currently exposed. I'll get in contact with the tesseract team and see if I can get it added. However this functionality will have to wait till the official 3.04 release.

charlesw avatar Nov 03 '14 21:11 charlesw

Thanks so much for looking into this. So I guess I can't determine the justification using tesseract until they expose the functionality. Anyway would you happen to know of any way on how I might be able to detect justification?

adeebr avatar Nov 03 '14 21:11 adeebr

If you need this now a few possible solutions come to mind:

  • create your own micro ocr engine in C++ and expose it through a capi or messaging api. That way you could use tesseracts c++ api.
  • use one of the automatic binding generators that work with c++ apis.
  • Do first solution, but using managed c++. Saves you from having to write a c api and .net bindings
  • try some other image analysis libray, you might have some luck with aforge or one of Its derivatives.

Obviously all of these solutions would require a fair amount of work on your behalf.

charlesw avatar Nov 05 '14 09:11 charlesw

Okay thank you. In the mean time I won't implement it now I'll just wait for the next 3.04 release. But if it becomes urgent then I'll try to take the steps that you mentioned.

adeebr avatar Nov 07 '14 21:11 adeebr

Put in an official request to have this added to tesseract's CAPI (https://groups.google.com/forum/#!topic/tesseract-dev/WvJwZVJJO3M).

charlesw avatar Dec 07 '14 05:12 charlesw

This functionality has now been added to tesseract 3.04 (see issue 1388).

charlesw avatar Dec 07 '14 20:12 charlesw

Just realized I haven't exposed this functionality, moving to next release.

charlesw avatar Sep 14 '15 10:09 charlesw