tesseract icon indicating copy to clipboard operation
tesseract copied to clipboard

Crash effects of table lines on recognition accuracy

Open sxgkwei opened this issue 4 years ago • 1 comments

Environment

  • Tesseract Version: 4.1.1-rc2 , language data: chi_sim.traineddata
  • Commit Number:
  • Platform: Windows 7 64-bit

Current Behavior:

When there is a table on the picture, it cannot be recognized. The content in the table is basically completely confusing.

Expected Behavior:

The text in the form can be recognized normally

Suggested Fix:

Able to fix the misjudgment of horizontal / vertical / oblique table lines when recognizing converted text

0016

sxgkwei avatar Dec 11 '19 08:12 sxgkwei

The issues which might make it hard for Tesseract to recognize this image:

  1. A stamp which probably confuses the table detection.
  2. Mixing two writing directions in one block: left to right and top to bottom (Tesseract considers a table as one block).

amitdo avatar Jun 13 '22 12:06 amitdo