tesseract
tesseract copied to clipboard
Crash effects of table lines on recognition accuracy
Environment
- Tesseract Version: 4.1.1-rc2 , language data: chi_sim.traineddata
- Commit Number:
- Platform: Windows 7 64-bit
Current Behavior:
When there is a table on the picture, it cannot be recognized. The content in the table is basically completely confusing.
Expected Behavior:
The text in the form can be recognized normally
Suggested Fix:
Able to fix the misjudgment of horizontal / vertical / oblique table lines when recognizing converted text
The issues which might make it hard for Tesseract to recognize this image:
- A stamp which probably confuses the table detection.
- Mixing two writing directions in one block: left to right and top to bottom (Tesseract considers a table as one block).