unilm icon indicating copy to clipboard operation
unilm copied to clipboard

LayoutReader: how to get left to right and top to bottom bboxes from OCR results such as Tesseract

Open animebing opened this issue 2 years ago • 6 comments

Describe Model I am using (LayoutReader): thanks for your work, I am trying to do inference using your pretrained model and OCR result from Tesseract, but I meet some problems:

  • the input bboxes should be left to right and top to bottom, how to do this using bboxes coordinates
  • I am trying to sort bboxes using left top coordinates(x, y) like bboxes = sorted(bboxes, lambda x: [x[0], x[1]]), but I find the results is much worse than use bboxes order from Tesseract directly

animebing avatar Mar 18 '22 03:03 animebing

@animebing Which dataset are you using?

wolfshow avatar Mar 19 '22 06:03 wolfshow

@animebing Which dataset are you using?

I don't use any dataset, I just use Tesseract to get OCR result from an image and try to use the provided pretrained model to do inference

animebing avatar Mar 21 '22 05:03 animebing

@animebing, LayoutReader is designed for training models using your own datasets or ReadingBank. It is better to provide labeled data in your domain and train a model for your applications.

wolfshow avatar Mar 21 '22 06:03 wolfshow

@wolfshow you mean left-right-top-bottom is customized for each dataset? but if I want to evaluate the performance of the pretrained model in a real world image using Tesseract to get OCR result, how should I organize bboxes to meet left-right-top-bottom of the pretrained model?

animebing avatar Mar 21 '22 06:03 animebing

LayoutReader is trained with left-to-right and top-to-bottom order so it is recommended to follow a similar setting in the real application. I assume the difference you mentioned is probably because the bboxes are not aligned well in real applications, which means, for example, there might be a very small difference in the y-axis even if two words are in the same line.

And, I assume the bboxes are expressed as [x0, y0, x1, y1] so if you sort the words using bboxes = sorted(bboxes, lambda x: [x[0], x[1]]), you are actually sorting the words in a column-first order.

So please double check you code and try some soft mechanism when sorting the bboxes. Then see the result.

zlwang-cs avatar Apr 26 '22 21:04 zlwang-cs

LayoutReader is trained with left-to-right and top-to-bottom order so it is recommended to follow a similar setting in the real application. I assume the difference you mentioned is probably because the bboxes are not aligned well in real applications, which means, for example, there might be a very small difference in the y-axis even if two words are in the same line.

And, I assume the bboxes are expressed as [x0, y0, x1, y1] so if you sort the words using bboxes = sorted(bboxes, lambda x: [x[0], x[1]]), you are actually sorting the words in a column-first order.

So please double check you code and try some soft mechanism when sorting the bboxes. Then see the result.

请问会开源构建数据集时候用到的代码吗?

Bourne-M avatar May 05 '22 08:05 Bourne-M