tesserocr icon indicating copy to clipboard operation
tesserocr copied to clipboard

Using PSM.AUTO_OSD or default doesn't make any difference

Open Belval opened this issue 7 years ago • 1 comments

Hi,

I noticed that the text extracted from an image will be the same regardless of if I use PSM.AUTO_OSD or the default (PSM.AUTO according to the code).

Weirder yet, AUTO_OSD (which is OCR + OSD) takes about ~~half the time~~ the same time as the default while the latter isn't supposed to use OSD.

And even weirder, the default does in fact OSD since I can OCR 90/180/270 rotated images.

Is possible that the comments are wrong and the default is AUTO_OSD?

Belval avatar Jul 25 '18 14:07 Belval

For the record: Tesseract itself is a little weak in documenting this properly. (It happened when transitioning from version 3 to LSTM-based 4.)

  1. OSD (as in DetectOrientationScript() or DetectOS()) is a legacy feature (i.e. only available with the old engine still compiled in, and not deactivated via oem=LSTM_ONLY). It also requires installing the osd.traineddata model (which contains samples from all major scripts for script detection). It is active in AUTO_OSD (as well as OSD_ONLY and SPARSE_TEXT_OSD). When active, it is used during layout analysis. That means, its scripts are added to the loaded languages, and its orientation (multiple of 90°) is applied – if the confidence threshold is met (i.e. the best score is at least min_orientation_margin away from the next-best candidate).
  2. Tesseract >= 4 also has orientation and skew detection independent of that (as part of AnalyseLayout() / FindLines() and can be queried via Orientation() in the page iterator). This is active in PSM.AUTO (as well as AUTO_OSD, AUTO_ONLY and SPARSE_TEXT_OSD). It is also used (after OSD, if any). It does not have confidence thresholds (that I know of). Besides transposing away multiples of 90°, it can also rotate arbitrarily to deskew.

Thus, there are two different implementations with different APIs and confusing overlap in terminology. See here for a feature comparison.

Now regarding your questions:

  • Yes, you can therefore get the same results (including transposition and even rotation), irrespective of whether OSD is allowed.
  • I don't think you would see a large toll of OSD on CPU time. But check that OSD can run properly to begin with (osd model + OEM mode).
  • Yes, we need to improve documentation on that here.

bertsky avatar Apr 21 '20 23:04 bertsky