image2csv icon indicating copy to clipboard operation
image2csv copied to clipboard

Pre-processing enhancement

Open artperrin opened this issue 4 years ago • 1 comments

The pre-processing function of the tool.py file does some image segmentation to each region for Tesseract to identify the region's number. But when the input image has a grid, and fragments of this grid appears on a region, Tesseract generates an error.

error-grid

This trouble forces the user to be carefoul when drawing the first rectangle and setting the offset --- it can be very upsetting.

It seems that the grid could be removed from each regions with some elementary image segmentation using OpenCV. At the time, I can think of using a clear border function (like imclearborder in MatLab) or trying to detect the grid's lines and remove them.

artperrin avatar Feb 03 '21 16:02 artperrin

I tested the clear border function idea : it turns out that Nitish9711 implemented this function for his projet, and it does a great job !

killborder

The only problem is that it does not work when the number is inside a full grid:

killborderNo

artperrin avatar Feb 03 '21 17:02 artperrin