table-transformer icon indicating copy to clipboard operation
table-transformer copied to clipboard

Question on post-processing table structure with text bounding boxes

Open RobAcc22 opened this issue 2 years ago • 3 comments

Hello, I am working with the table structure detection model, using it over table images. I extract the structure and the text, using CRAFT for the detection of the text bounding boxes and the table-transformer model for the table structure. To post-process the table structure prediction I use the text bounding boxes with the postprocess functions.

I encounter the following problem when following this approach. For some table images in which the text in a cell is a single character, CRAFT commonly detects those individual characters as together, producing large text bounding boxes like in the image below (second column). 22_07_28_18_17_20_high

The issue is when I use these bounding boxes, some of the predicted rows are enlarged so as they contain this large OCR bounding boxes. In the image below you see the raw predicted rows, without any postprocessing.

Empty table-07_in_table row

As you can see the predicted rows are accurate. But when I take the predicted table structure and put it together with the OCR bounding boxes, using the postprocess module and the function objects_to_cells, the rows transform to this: Empty table-07_out_rows

I hope it is visible that there is a green dotted row that goes from B to H characters, including exactly the text bounding box. I have been looking at this problem and it seems to be produced in the table_structure_to_cells function, in lines 810-844 of postprocess module.

I was wondering if you could suggest of a way to improve the postprocessing operations so this does not occur. Maybe adding a further step of postprocessing or modifying those lines of code. Or if you know of an algorithm that works better than CRAFT to detect text I am also interested.

Many thanks in advance

.

RobAcc22 avatar Jul 28 '22 16:07 RobAcc22

First of all, congrats on integrating OCR with the model code. This looks very well done and we hope it inspires others to do the same!

As far as your problem with the OCR is concerned, I don't see any easy way to overcome it using post-processing. If OCR does not give you a bounding box for B and C separately, you have no way to split that large text bounding box and know where B is and where C is within the box. So then you have no way to slot B and C into their correct cells using the model output.

One thing you could do is tell the post-processing code to ignore the word bounding boxes and keep its cell bounding boxes as-is. Then you could crop your input image at each cell bounding box and pass each individually to an OCR function to get the text of each cell. It sounds like a painful solution to me but could get the job done.

In my view, the best solution would be to get better OCR. Your case is a tricky one, it's easy to understand why the OCR naively thinks vertical characters stacked over each other would go together as a word.

You could try PyTesseract as an open source solution. I've also been very impressed with OCR from Azure Cognitive Services. I suggest giving these a try.

Cheers, Brandon

bsmock avatar Jul 28 '22 18:07 bsmock

Thanks for the answer.

The thing is the post-processing is quite useful in some other cases, so I'd prefer keeping this step. I will try to find a way to improve OCR bounding boxes.

Cheers, Roberto

RobAcc22 avatar Aug 01 '22 12:08 RobAcc22

hello, @RobAcc22 can u share the inference code for TSR thanks in advance.

zackwylde-cmd avatar Sep 20 '22 09:09 zackwylde-cmd