tesseract
                                
                                 tesseract copied to clipboard
                                
                                    tesseract copied to clipboard
                            
                            
                            
                        Determining justification
Hi,
I need to detect the justification of a block of text (eg.right, centred, left). I am still pretty new to the tesseract engine but after doing some research I believe that it is possible. References: http://zdenop.github.io/tesseract-doc/class_paragraph_model.html http://zdenop.github.io/tesseract-doc/ocrpara_8cpp_source.html http://zdenop.github.io/tesseract-doc/namespacetesseract.html#a550970d1662b3ac5830c6a28dba676b1
I was wondering if the paragraph model class is accessible through this wrapper? If yes then could you please provide an example of how to use it. If not then would it be possible to include it?
Thanks
Unfortunately no, the problem is that this information isn't exposed through the capi in Tesseract 3.02. I'll check the latest tesseract sources and see if its since been added later today. On 1 Nov 2014 00:11, "adeebr" [email protected] wrote:
Hi,
I need to detect the justification of a block of text (eg.right, centred, left). I am still pretty new to the tesseract engine but after doing some research I believe that it is possible. References: http://zdenop.github.io/tesseract-doc/class_paragraph_model.html http://zdenop.github.io/tesseract-doc/ocrpara_8cpp_source.html
http://zdenop.github.io/tesseract-doc/namespacetesseract.html#a550970d1662b3ac5830c6a28dba676b1
I was wondering if the paragraph model class is accessible through this wrapper? If yes then could you please provide an example of how to use it. If not then would it be possible to include it?
Thanks
— Reply to this email directly or view it on GitHub https://github.com/charlesw/tesseract/issues/131.
Just checked 3.04 (master branch) and can confirm this functionality isn't currently exposed. I'll get in contact with the tesseract team and see if I can get it added. However this functionality will have to wait till the official 3.04 release.
Thanks so much for looking into this. So I guess I can't determine the justification using tesseract until they expose the functionality. Anyway would you happen to know of any way on how I might be able to detect justification?
If you need this now a few possible solutions come to mind:
- create your own micro ocr engine in C++ and expose it through a capi or messaging api. That way you could use tesseracts c++ api.
- use one of the automatic binding generators that work with c++ apis.
- Do first solution, but using managed c++. Saves you from having to write a c api and .net bindings
- try some other image analysis libray, you might have some luck with aforge or one of Its derivatives.
Obviously all of these solutions would require a fair amount of work on your behalf.
Okay thank you. In the mean time I won't implement it now I'll just wait for the next 3.04 release. But if it becomes urgent then I'll try to take the steps that you mentioned.
Put in an official request to have this added to tesseract's CAPI (https://groups.google.com/forum/#!topic/tesseract-dev/WvJwZVJJO3M).
This functionality has now been added to tesseract 3.04 (see issue 1388).
Just realized I haven't exposed this functionality, moving to next release.